this post was submitted on 17 Jan 2024
4 points (75.0% liked)

Cybersecurity

5650 readers
87 users here now

c/cybersecurity is a community centered on the cybersecurity and information security profession. You can come here to discuss news, post something interesting, or just chat with others.

THE RULES

Instance Rules

Community Rules

If you ask someone to hack your "friends" socials you're just going to get banned so don't do that.

Learn about hacking

Hack the Box

Try Hack Me

Pico Capture the flag

Other security-related communities !databreaches@lemmy.zip !netsec@lemmy.world !cybersecurity@lemmy.capebreton.social !securitynews@infosec.pub !netsec@links.hackliberty.org !cybersecurity@infosec.pub !pulse_of_truth@infosec.pub

Notable mention to !cybersecuritymemes@lemmy.world

founded 1 year ago
MODERATORS
top 1 comments
sorted by: hot top controversial new old
[–] autotldr@lemmings.world 1 points 9 months ago

This is the best summary I could come up with:


Analysis AI biz Anthropic has published research showing that large language models (LLMs) can be subverted in a way that safety training doesn't currently address.

The work builds on prior research about poisoning AI models by training them on data to generate malicious output in response to certain input.

In a social media post, Andrej Karpathy, a computer scientist who works at OpenAI, said he discussed the idea of a sleeper agent LLM in a recent video and considers the technique a major security challenge, possibly one that's more devious than prompt injection.

"The concern I described is that an attacker might be able to craft special kind of text (e.g. with a trigger phrase), put it up somewhere on the internet, so that when it later gets pick up and trained on, it poisons the base model in specific, narrow settings (e.g. when it sees that trigger phrase) to carry out actions in some controllable manner (e.g. jailbreak, or data exfiltration)," he wrote, adding that such an attack hasn't yet been convincingly demonstrated but is worth exploring.

"In settings where we give control to the LLM to call other tools like a Python interpreter or send data outside by using APIs, this could have dire consequences," he wrote.

Huynh said this is particularly problematic where AI is consumed as a service, where often the elements that went into the making of models – the training data, the weights, and fine-tuning – may be fully or partially undisclosed.


The original article contains 1,037 words, the summary contains 248 words. Saved 76%. I'm a bot and I'm open source!