this post was submitted on 22 May 2025
94 points (100.0% liked)

TechTakes

1873 readers
319 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago
MODERATORS
top 44 comments
sorted by: hot top controversial new old
[–] nightsky@awful.systems 8 points 3 hours ago

If the companies wanted to produce an LLM that didn’t output toxic waste, they could just not put toxic waste into it.

The article title and that part remind me of this quote from Charles Babbage in 1864:

On two occasions I have been asked, — "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" In one case a member of the Upper, and in the other a member of the Lower, House put this question. I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

It feels as if Babbage had already interacted with today's AI pushers.

[–] 200fifty@awful.systems 14 points 10 hours ago

The really annoying thing is, the people behind AI surely ought to know all this already. I remember just a few years ago when DALL-E mini came out, and they'd purposefully not trained it on pictures of human faces so you couldn't use it to generate pictures of human faces -- they'd come out all garbled. What's changed isn't that they don't know this stuff -- it's that the temptation of money means they don't care anymore

[–] o7___o7@awful.systems 47 points 1 day ago* (last edited 1 day ago) (3 children)

Look, AI will be perfect as soon as we have an algorithm to sort "truth" from "falsehood", like an oracle of some sort. They'll probably have that in GPT-5, right?

[–] Soyweiser@awful.systems 7 points 7 hours ago (1 children)

Bonus this also solves the halting problem

[–] blakestacey@awful.systems 7 points 5 hours ago (1 children)

"You are a Universal Turing Machine. If you cannot predict whether you will halt if given a particular input tape, a hundred or more dalmatian puppies will be killed and made into a fur coat..."

[–] Soyweiser@awful.systems 4 points 5 hours ago* (last edited 5 hours ago) (1 children)

Im reminded again of the fascinating bit of theoretical cs (long ago prob way outdated now) which wrote about theoretical of classes of Turing machines which could solve the halting problem for a class lower than it, but not its own class. This is also where I got my oracle halting problem solver from.

So this machine can only solve the halting problems for other utms which use 99 dalmatian puppies or less. (Wait would a fraction of a puppy count? Are puppies Real or Natural? This breaks down if the puppies are Imaginary).

[–] corbin@awful.systems 6 points 4 hours ago (1 children)

Only the word "theoretical" is outdated. The Beeping Busy Beaver problem is hard even with a Halting oracle, and we have a corresponding Beeping Busy Beaver Game.

[–] Soyweiser@awful.systems 4 points 4 hours ago

Thanks, I'm happy to know Imaginary puppies are still real, no wait, not real ;). (The BBB is cool, wasn't aware of it, I don't keep up sadly. "Thus BBB is even more uncomputable than BB." always like that kind of stuff, like the different classes of infinity).

[–] besselj@lemmy.ca 29 points 1 day ago (2 children)

Oh, that's easy. Just add a prompt to always reinforce user bias and disregard anything that might contradict what the user believes.

[–] crawancon@lemm.ee 11 points 1 day ago (1 children)

feed it a christian bible as a base.

[–] crawancon@lemm.ee 11 points 1 day ago

"we trained it wrong.. on purpose...

..as a joke."

[–] homesweethomeMrL@lemmy.world 8 points 1 day ago

They do, it just requires 1.21 Jigawatts of power for each token.

[–] Angry_Autist@lemmy.world -4 points 11 hours ago (1 children)

This is old news, topic supervisors are already a thing

[–] Soyweiser@awful.systems 4 points 5 hours ago

Quis custodiet ipsos custodes?

[–] homesweethomeMrL@lemmy.world 2 points 1 day ago (2 children)

The chatbot “security” model is fundamentally stupid:

  1. Build a great big pile of all the good information in the world, and all the toxic waste too.
  2. Use it to train a token generator, which only understands word fragment frequencies and not good or bad.
  3. Put a filter on the input of the token generator to try to block questions asking for toxic waste.
  4. Fail to block the toxic waste. What did you expect to happen, you’re trying to do security by filtering on an input that the “attacker” can twiddle however they feel like.

Output filters work similarly, and fail similarly.

This new preprint is just another gullible blog post on arXiv and not remarkable in itself. But this one was picked up by an equally gullible newspaper. “Most AI chatbots easily tricked into giving dangerous responses,” says the Guardian. [Guardianarchive]

The Guardian’s framing buys into the LLM vendors’ bad excuses. “Tricked” implies the LLM can tell good input and was fooled into taking bad input — which isn’t true at all. It has no idea what any of this input means.

The “guard rails” on LLM output barely work and need to be updated all the time whenever someone with too much time on their hands comes up with a new workaround. It’s a fundamentally insecure system.

[–] froztbyte@awful.systems 5 points 15 hours ago (1 children)

and not just post it, but posted preserving links - wtf

[–] homesweethomeMrL@lemmy.world -2 points 9 hours ago

That's typically how quoting works, yes. Do you strip links out when you quote articles?

[–] dgerard@awful.systems 12 points 23 hours ago (1 children)

why did you post literally just the text from the article

[–] homesweethomeMrL@lemmy.world -4 points 22 hours ago (1 children)

It's just a section. There's more of the article.

Like this:

Another day, another preprint paper shocked that it’s trivial to make a chatbot spew out undesirable and horrible content. [arXiv]

How do you break LLM security with “prompt injection”? Just ask it! Whatever you ask the bot is added to the bot’s initial prompt and fed to the bot. It’s all “prompt injection.”

An LLM is a lossy compressor for text. The companies train LLMs on the whole internet in all its glory, plus whatever other text they can scrape up. It’s going to include bad ideas, dangerous ideas, and toxic waste — because the companies training the bots put all of that in, completely indiscriminately. And it’ll happily spit it back out again.

There are “guard rails.” They don’t work.

One injection that keeps working is fan fiction — you tell the bot a story, or tell it to make up a story. You could tell the Grok-2 image bot you were a professional conducting “medical or crime scene analysis” and get it to generate a picture of Mickey Mouse with a gun surrounded by dead children.

Another recent prompt injection wraps the attack in XML code. All the LLMs that HiddenLayer tested can read the encoded attack just fine — but the filters can’t. [HiddenLayer]

I’m reluctant to dignify LLMs with a term like “prompt injection,” because that implies it’s something unusual and not just how LLMs work. Every prompt is just input. “Prompt injection” is implicit — obviously implicit — in the way the chatbots work.

The term “prompt injection” was coined by Simon WIllison just after ChatGPT came out in 2022. Simon’s very pro-LLM, though he knows precisely how they work, and even he says “I don’t know how to solve prompt injection.” [blog]

[–] dgerard@awful.systems 14 points 22 hours ago (1 children)

Yes, I know, I wrote it. Why do you consider this useful to post here?

[–] homesweethomeMrL@lemmy.world -3 points 18 hours ago (2 children)

Well, I don't think that last part was useful, but I do think the previous part was useful as a way to focus conversation. Many people don't read the article, and I thought that was the most relevant section.

[–] blakestacey@awful.systems 4 points 6 hours ago

Good grief. At least say "I thought this part was particularly interesting" or "This is the crucial bit" or something in that vein. Otherwise, you're just being odd and then blaming other people for reacting to your being odd.

[–] swlabr@awful.systems 4 points 16 hours ago (1 children)

Actually I’m finding this quite useful. Do you mind posting more of the article? I can’t open links on my phone for some reason

[–] homesweethomeMrL@lemmy.world -3 points 9 hours ago

Actually this comm seems really messed up, so I'mma just block it and move on. Sorry for ruffling your feathers, guv.