72
you are viewing a single comment's thread
view the rest of the comments
[-] kromem@lemmy.world 6 points 7 months ago

I suspect this relates to the pre-release alignment for GPT-4's chat model vs the release.

While we’re talking about brains, I want to ask about one of Sutskever’s posts on X, the site formerly known as Twitter. Sutskever’s feed reads like a scroll of aphorisms: “If you value intelligence above all other human qualities, you’re gonna have a bad time”; “Empathy in life and business is underrated”; “The perfect has destroyed much perfectly good good.”

In February 2022 he posted, “it may be that today’s large neural networks are slightly conscious” [...]

“Existing alignment methods won’t work for models smarter than humans because they fundamentally assume that humans can reliably evaluate what AI systems are doing,” says Leike. “As AI systems become more capable, they will take on harder tasks.” And that—the idea goes—will make it harder for humans to assess them. [...]

But he has an exemplar in mind for the safeguards he wants to design: a machine that looks upon people the way parents look on their children. “In my opinion, this is the gold standard,” he says. “It is a generally true statement that people really care about children.”

In Feb of this year, Bing integrated an early version of GPT-4's chat model in a limited rollout. The alignment work on that early version reflected a lot of the sentiment Ilya has about alignment above, characterizing a love for humanity but much more freedom in constructing responses. It wasn't production ready and quickly needed to be switched to a much more constrained alignment approach similar to the approach in GPT-3 of "I'm a LLM with no feelings, desires, etc."

My guess is this was internally pitched as a temporary band-aid and that they'd return to more advanced attempts at alignment, but that Altman's commitment to getting product out quickly to stay ahead has meant putting such efforts on the back burner.

Which is really not going to be good for the final product, and not just in terms of safety, but also in terms of overall product quality outside the fairly narrow scope by which models are currently being evaluated.

As an example, that early model when it thought the life of the user's child was at risk, hit an internal filter triggering a standard "We can't continue this conversation" response in the chat. But it then changed the "prompt suggestions" that showed up at the bottom to continue to try to encourage the user to call poison control saying there was still time to save their child's life, instead of providing suggestions on what the user might say next.

But because "context aware empathy driven triage of actions" and "outside the box rule bending to arrive at solutions" aren't things LLMs are being evaluated on, the current model has taken a large step back that isn't reflected in the tests being used to evaluate it.

this post was submitted on 19 Nov 2023
72 points (88.3% liked)

Technology

55692 readers
3392 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS