this post was submitted on 29 Jan 2025
1123 points (99.0% liked)
Not The Onion
13002 readers
1497 users here now
Welcome
We're not The Onion! Not affiliated with them in any way! Not operated by them in any way! All the news here is real!
The Rules
Posts must be:
- Links to news stories from...
- ...credible sources, with...
- ...their original headlines, that...
- ...would make people who see the headline think, “That has got to be a story from The Onion, America’s Finest News Source.”
Comments must abide by the server rules for Lemmy.world and generally abstain from trollish, bigoted, or otherwise disruptive behavior that makes this community less fun for everyone.
And that’s basically it!
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Wherever you fall on the anti-AI spectrum, I thought after the past 2 decades of piracy we had come to the conclusion that you can't "steal" data, copying != Stealing
If anything, this is kind of making people realize the opposite. It isn't stealing when it is corporate (or creator...) but it is TOTALLY stealing when it is individual people... who aren't authors or artists.
The fun part is that "creating" datasets for training steal from everyone equally.
And when it comes to authors and artists, it amounts to wage theft. When a company hires an artist to make an ad, the artist gets paid to make it. If you then take that ad, you're not taking money from the worker - they already got paid for the work that they did. Even if you take a piece from the social media of an independent artist and make a meme out of it or something, so long as people can find that artist, it can lead to people hiring them. But if you chop it up and mash it into a data set, you're taking their work for profit or to avoid paying them for their skills and expertise to create something new. AI can not exist without a constant stream of human art to devour, yet nobody thinks the work to produce that art is worth paying for. It's doing a corporation to avoid paying the working class what their skills are worth.
That is, sadly, incorrect. What IS true is that AI cannot be "born" without massive amounts of human content. But once you have a solid base model (and I do not believe we currently do), you no longer need input art or input prose. The model can generate that. What you DO need is feedback on whether a slight variation is good or bad. Once you have that labeled data you then retrain. Plenty of existing ML models do exactly this.
And, honestly? That isn't even all that different from how humans do it. It is an oversimplification that ignores lead times, but just look at everyone who suddenly wants to talk about how much Virtuoisity influenced The Matrix. Or, more reasonably, you can look at how different eras of film are clearly influenced by the previous. EVERYONE doing action copied John Woo and then there was the innovation to add slowmo to more or less riff on wire work common in (among other places) Chinese films. And that eventually became a much bigger focus on slow mo to show the impact of a hit and so forth.
There is not something intrinsically human to saying "can I put some jelly in my peanut butter?". But there IS soemthing intrinsically human to deciding if that was a good idea.... to humans.
I agree that's a BIG if. In an ideal world, people would cite their sources and bring more attention to the creator. I also didn't mean that artists should create work for the opportunity to have it turned into a meme and maybe go viral and get exposure that way, but that at least there's a chance of people getting more clients through word of mouth that way for work that they've already done, however small, compared to having their art thrown into a training algorithm which has an absolutely zero chance of the artist seeing any benefit.
Last I heard, current AI will devour themselves if trained on content from other AI. It simply isn't good enough to use, and the garbage noise to value ratio is too high to make it worth filtering through. Which means that there is still a massive demand for human-made content, and possibly will be even more demand in the future for some time yet. Pay artists to create that content, and I see no real problem in the model. There are some companies that have started doing just that. Procreate has partnered with a company that creates websites that is hiring artists to create training data for their UI generating LLM and paying those artists commission fees. Nobody has to spend their day making hundreds of buttons for stupid websites, and the artists get paid. A win-win for everybody.
My stance on AI always comes down to the ethics behind the creation of the tool, not the tool itself. My pie in the sky scenario would be that artists could spend their time making what they want to make without having to worry about whether or not they can afford rent. There's a reason we see most artists posting only commission work online, and it's because they can't afford to work on their own stuff. My more realistic view is that there's a demand for content to train these things, so pay the people making that content an appropriate wage for their work and experience. There could be an entire industry around creating stuff specifically for different content tags for training data.
And as for AI being similar to humans, I think you're largely right. It's a really simplified reproduction of how human creativity and inspiration work, but with some major caveats. I see AI as basically a magic box containing an approximation of skill but lacking understanding and intent. When you give it a prompt, you provide the intent, and if you're knowledgeable, you have the understanding to apply as well. But many people don't care about the understanding or value the skill, they just want the end result. Which is where we stand today with AI not being used for the betterment of our daily lives, but just as a cost-cutting tool to avoid having to pay workers what they're worth.
Hence, we live in a world where they told us when we were growing up that AI would be used to do the things we hate doing so that we had more time to write poetry and create art, while today AI is used to write poetry and create art so that we have more time to work our menial jobs and create value for shareholders.
Yeah... that is right up there with "AI can't do feet" in terms of being nonsense people spew.
There is nothing inherently different between a picture made by an LLM and a picture drawn by Rob Liefeld. Both have fucked up feet and both can be fixed with a bit of effort.
The issue is more the training data itself. Where this CAN cause a problem is if you unknowingly train on endless amounts of ai generated content. But... we have the same problem with training on endless amounts of human content. Very few training sets (these days) bother to put in the time to actually label what input is. So it isn't "This is a good recipe, that is a bad recipe, and that is an ad for betterhelp". It is "This is all the data we scraped off every public facing blog and youtube transcript".
Its also why the major companies are putting a big emphasis on letting customers feed in their own data. Partially that is out of the understanding that people might not want to type corporate IP into a web interface. But it is also because it provides a way to rapidly generate some labeled data because you know that customer cares about widgets if they have twelve gigs of documents on widgets.
And what is the difference between someone getting paid to draw a picture of Sonic choking on a chili dog by a rando versus an AI generated image of the same?
At the end of the day, we aren't going to see magic AIs generating everything with zero prompting (maybe in a decade or two... if the world still exists). Instead what we see is people studying demand and creating prompts based on that. Which... isn't that different from how hollywood studios decide which script to greenlight or not.
You're largely arguing what I'm saying back at me. I didn't mean that the AI is bad, but that the AI content that's out there has filled the internet with tons of low quality stuff over the past few years, and enough of this garbage going in degrades the quality coming out, in a repeating cycle of degradation. You create biases in your model, and feeding those back in makes it worse. So the most cost-effective way to filter it out is to avoid training on possibly AI content altogether. I think OpenAI was limiting the training data for ChatGPT to stuff from before 2020 up until this past year or so.
It's a similar issue to what facial recognition software had. Early on, facial recognition couldn't tell the difference between two women, two black people (men or women), or two white men under the age of 25 or so. Because it was trained on the employees working on it, who were mostly middle-aged white men.
This means that there's a high demand for content to train on, which would be a perfect job to hire artists for. Pay them to create work for whatever labels you're looking for for your data sets. But companies don't want to do that. They'd rather steal content from the public at large. Because AI is about cutting costs for these companies.
To put it simply: AI can generate an image, but it isn't capable of understanding 2-point perspective or proper lighting occlusion, etc. It's just a tool. A very powerful tool, especially in the right hands, but a tool nonetheless. If you look at AI images, especially ones generated by the same model, you'll begin to notice certain specific mistakes - especially in lighting. AI doesn't understand the concept of lighting, and so has a very hard time creating realistic lighting. Most characters end up with competing light sources and shadows from all over the place that make no sense. And that's just a consequence of how specific you'd need your prompt to be in order to get it right.
Another flaw with AI is that it can't iterate. Production companies that were hiring AI prompters to their movie crews have started putting blanket bans on hiring prompters because they simply can't do the work. You ask them to give you 10 images of a forest, and they'll come back the next day with 20. But you say, "Great, I like this one, but take the people out of it," and they'll come back the next day with 15 more pictures of forests, but not the original without people in it. It's a great tool for what it does, but you can't tell it, "Can you make the chili dog 10 times larger" and get the same piece, just with a giant chili dog.
And don't get me started on Hollywood or any of those other corporate leeches. I think Adam Savage said it best when he said last year that someday, a film student is going to do something really amazing with AI - and Hollywood is going to copy it to death. Corporations are the death of art, because they only care about making a product to be consumed. For some perfect examples of what I mean, you should check out these two videos: Why do "Corporate Art Styles" Feel Fake? by Solar Sands, and Corporate Music - How to Compose with no Soul by Tantacrul. Corporations also have no courage when money is on the line, so that's why we see so many sequels and remakes out of Hollywood. People aren't clamoring for a live action remake of (insert childhood Disney movie here), but they will go and watch it, and that's a safe bet for Hollywood. That's why we don't see many new properties. Artists want to make them, but Hollywood doesn't.
As I said, in my ideal world, AI would be making that corporate garbage and artists would be able to create what they actually want. But in the real world, there's very little chance that you can keep a roof over your head making what you want. Making corporate garbage is where the jobs are, and most artists have very little time left over for working on personal stuff. People always ask questions like, "Why aren't people making statues like the Romans did," or "Why don't we get paintings like Rembrandt used to do." And the answer is, because nobody is paying artists to make them. They're paying them to make soup commercials, and they don't even want to pay them for that.
Nuance not your strong suit eh?
I'm curious what "nuance" I am missing.
I mean, it isn't like OpenAI or DeepSeek were going to pay for it anyway. So there is no loss revenue and it isn't stealing. Besides, you can't download a car so it isn't even stealing.
Its just that people are super eager to make themselves morally righteous when they are explaining why it is fine to not give a shit about the work of one person or hundreds of persons when they want something. But once they see corporations (and researchers) doing the exact same thing to them? THIEVERY!!!
When the reality is: Yeah, it is "theft" either way. Whether you care is up to you.
I can’t recall a time when I downloaded an album, pretended I made it, and tried to sell access to it.
This would be more like buying a van halen album, learning eddies style, writing and recording my own van halen style songs and selling those. Still an infringement?
Well, your example isn't quite right because these companies didn't buy the data originally.
I'd say it's more like when somebody samples a song without permission and uses it in their own music. If we wanna go even further, I'd say the AI companies we have today are basically making and selling synthesizers created off of samples used without permission. The AI don't learn the way we do, they simply regurgitate what they think is correct based on the probability they get from an algorithm derived from their training set.
That's not how training works with LLMs at all
It does make alterations of it, it's completely shredded up as part of the training process and turned into numbers and statistics mushed with a bunch of other numbers and statistics.
It's like baking a cake, you mix in flour, butter, eggs, and bake it. Once mixed and baked you can't get the flour, butter and eggs back to their original form and the final product is completely different
If it wasn't you'd be able to pull full unaltered copies directly from the model files, but that hasn't been accomplished. The best that people have been able to do is get the AI to recreate something pretty close to the original with very careful and specific prompts. But it's still a recreation, based on what it "learned".
Yes, my edit was a bit hyperbolic. The point being that current AI/LLM companies have been, at best, encoding data that they do not have permission to use into their models.
It's more like baking a cake with flour, butter, and eggs that you snagged from other people's grocery baskets after they paid for them. Then, started selling the cakes made from said ingredients.
Ideally, none of that would matter because knowledge and data want to be free and everyone would benefit. However, we don't live in such a world. Instead, the technology is being used almost exclusively to extract wealth from people and make the average human being's life worse, both in the short-term by reducing their ability to support themselves and in the long-term by drastically increasing consumption of fossil fuels and potable water, putting more pressure on the biosphere.
Uh, who is "we"? Piracy is still illegal and not everyone approves of it.
Maybe in the mainstream, but the open source, socialist and anarchist communities that populate lemmy tend to be very critical of ideas like intellectual property and copy right.
Oh, you're one of those weirdos that report people shoplifting at Walmart and probably was also the "Teacher, your forgot our homework" kids
Uh, no, and what a wild assumption to make from me stating a fact, unless you know something that I don't?
You made it pretty clear what your stance is, legality is not an indicator of morality or ethicality
I did not state my stance at all. You are assuming things my friend. I did not comment to weigh in with my opinion, I commented to challenge your assertion that "we" had decided something. Who is "we"?
Everyone except you apparently.
I didn't state my stance. It's obviously not everyone as I said before.
You did, by starting off with "Who is we" you stated you're breaking from the stance of my comment that you replied to and aligning yourself with the second part of your statement "Because it's illegal and many don't approve of it"
No, that's not how that works. You still do not know my stance no matter how much you want to assume you do. Good day.
You guys are always so funny. You want to take the contrarian stance without actually taking the heat for the stance. You're not fooling anyone except yourself, all the "well I didn't state my stance so it's impossible for you to know!"
So I'll ask, what's your stance on piracy?
Who is "you guys"? A lot of blanket generalizations being thrown around here.
I genuinely don't know who original commenter meant by "we". It's certainly not "everyone", and it's not the legislators, or the big tech companies, or most people with IP to protect. So who is "we"?
And I don't know who "you guys" is in your comment is either. What am I a part of? This grouping "us vs them" thing going on is so damn odd to me, genuinely.
I stated what is, in my view, just a fact to challenge a strange assertion made by someone on the internet and at no point did it get pushed back on, instead the focus was completely shifted to a completely assumed stance I have and personal attacks. How can anyone have conversations this way? It's insane.
My stance on piracy is complicated. I used to do it a ton, I do it a lot less now but still do some. I won't go into more detail than that because my personal opinion on piracy was never the point here and not why I commented.