this post was submitted on 28 Jan 2024
380 points (95.2% liked)
Technology
59466 readers
3522 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
The fact that the "AI" can spit out whole passages verbatim when given the right prompts, suggests that there is a big problem here and they haven't a clue how to fix it.
It's not "learning" anything other than the probable order of words.
I really hate this reduction of gpt models. Is the model probabilistic? Absolutely. But it isn't simply learning a comprehensible probability of words--it is generating a massively complex conditional probability sequence for words. Largely, humans might be said to do the same thing. We make a best guess at the sequence of words we decide to use based on conditional probabilities along a myriad number of conditions (including semantics of the thing we want to say).
What about these:
https://arxiv.org/abs/2310.02207
https://notes.aimodels.fyi/researchers-discover-emergent-linear-strucutres-llm-truth/
https://notes.aimodels.fyi/self-rag-improving-the-factual-accuracy-of-large-language-models-through-self-reflection/
Completely agree. And that should be the focal point of the issue.
Sam Altman is correctly stating that AI is not possible without using copyrighted materials. And I don't think there's anything wrong with that.
His mistake is not redirecting the conversation. He should be talking about the efforts they're making to stop their machine from reproducing copyrighted works. Not whether or not they should be allowed to use it in the first place.