this post was submitted on 26 Aug 2023
355 points (93.6% liked)
Technology
59349 readers
4949 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You keep using that word "stolen", I do not think it means what you think it means.
Also, AIs do not "mash together" works from their training sets. This is a very common and very incorrect conception of how they work. They are not collage generators or copy-and-paste machines. They learn concepts from the images they train on, they don't actually remember fragments of those images to later regurgitate in some sort of patched-together Frankenstein's Monster.
You're correct but it's still too early and most people haven't spend enough time with AI to fully understand. Maybe they never will.
Like the classic quote says, it is difficult to get a man to understand something when his salary depends upon his not understanding it.
I just asked Wombo Dream to make the Mona Lisa and it did. Sure, you can tell it's not exactly the real thing, but I don't know how you can say it didn't copy any of the actual Mona Lisa original.
I considered including mention of overfitting in my earlier comment, but since it's such an edge case I felt it would just be an irrelevant digression.
When a particular image has a great many duplicates in the training set - hundreds or even thousands of copies are necessary - then you get the phenomenon of overfitting. In that case you do get this sort of "memorization" of a particular image, because during training you are hitting the neural net over and over with the exact same inputs and really drilling it into them. This is universally considered undesirable, because there's no point to it - why spend thousands of dollars to do something that a copy/paste command could do so much better and more easily? So when image generators are trained the training data goes through a "de-duplication" step intended to try to prevent this sort of thing from happening. Images like the Mona Lisa are so incredibly common that they still slip through the cracks, though.
There's a paper from some months back that commonly comes up when people want to go "aha, generative AI copies its training data!" But in reality this paper shows just how difficult it is to arrange for overfitting to happen. The researchers used an older version of Stable Diffusion whose training set was not well curated and is no longer used due to its poor quality, and even then it took them hundreds of millions of attempts to find just a handful of images from the training set that they could dredge back out of it in recognizable form.
People have also copied art for as long as art has existed. You can buy a copy of the Mona Lisa in the gift shop, or print your own. That's why the market for art has always been hyperfocus3d on 'originals'. But rarely are the artists the ones getting rich off their art, especially now. I hate capitalism as much as anyone but if your motivation for making art is money you're in the wrong business and your art probably isn't that good anyway.