251
OpenAI says it’s “impossible” to create useful AI models without copyrighted material
(arstechnica.com)
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
When people say that the "model is learning from its training data", it means just that, not that it is human, and not that it learns exactly humans. It doesn't make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.
Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.
For example, when a model takes in a thousand images of circles, it doesn't "learn" a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from "cat" to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.
In general I agree with you, but AI doesn't learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it's important to make the distinction.
That is why current models aren't regarded as actual intelligence, although people already call them that...
I understand. I didn't mean to imply any sort of understanding with the language I used.
It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.
But when it's pointed out that LLMs don't learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn't be judged by human standards? I don't know if it's intentional on your part, but that's a pretty classic example of a motte-and-bailey fallacy. You can't have it both ways.
I don't understand what you mean, can you elaborate?