LocalLLaMA
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
Rules:
Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.
Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.
Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.
Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.
view the rest of the comments
Ahh. But they are not. That's what we're discussing.
Let me make this clear: All intellectual property is arbitrary. I fear many copyright people have convinced themselves otherwise.
The government could grant the exclusive right to sell coffee in an area. That was done at one point. It could give the exclusive right to make shoes to some corporation. That was normal before the time of the French Revolution. The German constitution explicitly protects the right to chose one's profession. The origin of this lies in such feudal practices.
The US Constitution limits copyright because the founders were quite aware of how these feudal privileges were abused. European copyright descends from agreements between mostly monarchical empires. Rent-seeking was/is an intended feature, which is why Europeans are so easily defrauded by the copyright industry.
When you photograph an image, you have to get permission. Makes sense. When that image is in the background of a video, you may have to get permission. Makes less sense. You rarely have to get permission from makeup artists, hairdressers, and clothes designers. Why not, actually? Isn't that "theft" on a grand scale?
Historically, it makes sense. Originally, copyright was for printing. The only images you could print were engravings. It would have been hard to justify that the tailors, maids, or butlers should get a cut. And also, they were not a demographic that could expect to be favored with an economic rent from the elites.
And today? There are many photos that derive more value from the clothes and general appearance of the model than from anything else. And yet, the photographer owns the copyright and only needs to get permission from the model. How should that work?
By the by. Painters and some intellectuals raged against photography in much the same way that they rage against AI now. There is an essay by Charles Baudelaire that illustrates this nicely.
I feel we've ran into the exact same issue as before. Now we're talking property. But we were just talking about investment and we've just established those two are distinct and not the same. It's a bit confusing. And I agree, that resulting granted monopoly and rent-seeking is an intended feature, and not contributing to society. But my previous comment was addressing the aspect of the author's investment and ROI, not the resulting property from that. And that's not arbitrary at all. The author sat at his desk for 6 months specifically. Sure the resulting product is arbitrary when selling it for money, but that wasn't what we were talking about.
I don't think we're easily defrauded by the copyright industry. As I said, school-books seem like 10x cheaper here. Medication with pharma IP in it is mostly cheaper here, I have my library card for like 30€ a year?! And other than that we use the same Spotify and Netflix subscriptions for a similar price. There's no substantial difference with that. I don't see myself in a less favourable position than an US citizen. We also have access to information here, good books, podcasts, journalism, we have culture, concerts... And I don't think any of that is better or cheaper or more accessible in the US. Correct me if I'm wrong...
Yeah, some photography rules are absurd. I think it's completely mental that people do copyright infringement when they take a picture of a sculpture. Seems US Fair Use sometimes has weird quirks. We also have stupid rules for pictures in Germany.
Considering feudalism... I'd like to re-define that since wo don't have lords and a king for quite some time now. Today's land holders on the internet are companies like Meta, Google etc. They own the platforms we use on a daily basis. They make the rules, shape the place and lease chunks to us peasants as a service. We even let them shape society. For all intents and purposes, they're the feudal lords of today. And that's kind of the reason for my rejection here and why I said early on, all these AI companies are big multi-billion dollar corporations with motivations far from benefit to society. I believe concepts like Fair Use might have been invented as a means to combat feudalism. But looks to me like the situation is now changing and it's more and more used to the opposite effect by the feudal lords themselves to now contribute to their posessions, wealth and dominance.
I'll grant you the copyright industry is a worthy enemy, since they're villains, too. The copyright business model isn't healthy or beneficial to society overall. We've established that. But I really think of feudalism and a defacto-monopoly when I think of Google and Meta and OpenAI/Microsoft. And I'd really like to avoid making more concessions to my feudal lords.
Hmm. It looks like we are back to narratives again. Systematic analysis does not seem to come easy to you.
"Investment" and "rent-seeking" are concepts in economics. Like, say, "function" or "variable" are concepts in programming.
"Property" is a legal institution. It relates to "investment" a bit like a machine code instruction relates to programming. They are, sort of, the underlying facts on which higher concepts rest.
I guess you didn't get what I was trying to say. Let me put it like this:
If they wrote a story that takes place in the universe of a video game, then they need to get permission first. They need to ask whoever owns the rights to the video game, or else it is "theft".
Conversely, if the story is original, and anyone wants to make a video game in that universe, then they need the author's permission.
This remains so until 70 years after the death of the creator of the video game/story. At least, it is 70 years now. It may be made longer again at any time.
That is arbitrary, no?
Not just them, but yes. How do you think they manage that?
That seems pretty vibes-based. What do you rationally expect the outcome of your favored policies to be?
Yes. That's arbitrary. But we're conflating several very different things here. There is investment in form of labour. And I'm pretty sure we have to agree that in general, labour needs to be compensated in a capitalist economy. Then there is copyright. And this is intellectual property, which is yet another concept. All of this goes into a book, but they're all very different things. I think IP is the most abstract one (it protects concepts) and kind of moot. I'd be more lax with IP and try to allow everyone to draw a Mickey Mouse, program a Final Fantasy game or write a new Harry Potter book. Patents are a similar thing. Though we have them for a reason.
That's why I say I'm with you with the copyright and the intellectual property. But there's also work going into a book and we're always brushing over that as if it weren't a thing.
It's many factors. Timing, aggressive acquisition strategies, ecosystem building, network effects, then ecosystem lock-in, data harvesting, dominating standards, but also providing genuinely useful services. Economy of scale, massive capital... And I probably forgot dozens of factors, some legitimate, some exploitative.
Sorry, misunderstanding. I wasn't asking what you hope to happen.
You have ideas on how copyright should work wrt AI training. Make these ideas explicit, and then try to systematically analyze what the economic effects are.
Law can be a little bit like programming. A law has certain conditions. If these conditions are met, then certain legal effects follow.
If certain conditions are met, then someone has the exclusive copyright. If this copyright is violated, then damages must be paid. And of course, there are more rules to determine if copyright was violated or how those damages should be determined.
So under what conditions does AI training violate copyright? What would the legal consequence be? Then, what would that mean for the economic system on the whole?
That's a tough question. Copyright is showing its age and barely applies in the digital world. Even before AI we had a lot of edge cases and court cases over like a decade to find out how copyright applies to a digital concept. I don't think there is an easy way to retrofit something. At least I can't come up with a good idea. And the general proposal seems to be all or nothing.
What I think doesn't work is saying every normal citizen needs to buy books and Zuckerberg gets to pirate books. In a democracy law has to apply to everyone. And his use-case doesn't matter here. I can also claim I pirated the 10TB of TV shows and movies for transformative or legitimate use. It's still piracy. And other law works the same way. If I steal chocolate in the supermarket, that's also theft no matter what I was planning to do with it. So that's out.
And then we're left with how economy is supposed to work as of today. An AI company needs supplies to manufacture their product, they buy those supplies on the market... In this case that's going to be licensing content. Though, that's going to be hard. A billion dollar company with a service used by millions of people should pay more than a single researcher doing it for 5 people. And implementing that would be impossibly complex. One possible way would be to introduce a collecting society to handle the money and maths. But they're not ideal either.
So it's more or less down to allowing AI companies to use content with some kind of default license. They can take all the public information as they wish. Again, they can not steal in the process. They'll buy one copy of a Terry Pratchett novel at the same price everyone needs to pay.
And to compensate for them not having to contract with the authors an buy special licenses, they need to offer transparency. Tell the authors and everyone what went into the models and if their content is amongst that. And if they scraped my personal data, I need a way to get that deleted from the dataset.
I'd also add an optional opt-out mechanism to appease to the people who hate AI. They can add some machine-readable notice, or file a complaint and their content will be discarded.
And since just taking and not contributing back isn't healthy to society, I'd add something about "composite" works. If something like an AI model is just pieced together by other people's content, that doesn't deserve copyright in my opinion. So all generations are automatically public domain and maybe the models as well.
And we need a definition of AI and transformative. Once we get capable models with a ability to recite an entire novel word by word, that's going to run into copyright again. So yeah.
And intellectual property has to be softened. A generative AI model necessary "contains" a lot of IP, has knowledge about it and can reproduce it. And we need to be alright with that. And in case someone wants to outlaw impersonation and celebrity deepfakes, there needs to be more than a blurry line.
But all of this is more patching copyright and we're going to run into all kinds of issues with that. I think ideally we come up with a grand idea and overhaul the entire thing so it applies to the 21st century.
That's a good start.
The laws do apply to everyone equally, though few people are able to litigate for years against the copyright industry.
Your concern is obviously the use case. If the use case doesn't matter, then quotes and parody are illegal, as well as historical archiving and scientific analysis.
I guess you just want AI training to not be fair use. That raises the question of how this should work.
Maybe you think that different standards should be applied to Zuckerberg, after all. Your focus on him makes it seem a little like that.
Perhaps you simply have something more european in mind. Europe and in particular Germany do not have fair use. There is a short list of uses that do not require permission. That means that every time some new use becomes desirable, the law must be changes. This is obviously stifling for progress in science and culture. Think of HipHop with its use of samples. It's hard to imagine some artists successfully petitioning the government to legalize the practice before experimenting with it. You couldn't have developed a search engine that simply copies all web pages for indexing. Something like the Internet Archive, or the Wayback Machine, would be impossible. It would just be a few tech geeks against the copyright industry, including the media.
So, how should this be done?
Actually, no. Theft is prosecuted by the government; police and courts. Copyright infringement is generally a civil matter. Damages are paid but there is no criminal prosecution.
The government only cares for large-scale, industrial infringement, like EG operating a Netflix-like streaming service. Small scale infringement is not even criminal in the US. I believe, even in Europe, people who torrent movies or such are rarely criminally prosecuted.
Maybe you would like to see copyright infringement to be punished more harshly and enforced more strictly?
That's an interesting idea. It's not how we do anything else. You don't usually have to pay more for the same thing, depending on who you are or how much you use it. I expect, it would be quite devastating if that were the rule.
Should this policy idea apply only to copyright or generally? If only copyright, why?
Should there be exceptions for celebrities and such, or will they be able to demand licensing fees?
Then much public content can't be used, after all. The likes of Reddit, Facebook, or Discord will be able to charge licensing fees for their content, after all. It's very typically European. You rage against Meta's monopoly but you also call for laws to enforce and strengthen it. I think it's the echo of feudalism in the culture.