> fully open
> looks inside
> not open source
Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.
Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.
As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.
Rules:
Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.
Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.
Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.
Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.
> fully open
> looks inside
> not open source
To save everyone a click: It's a non-commercial license (with a very rude yoink clause, if anyone is foolish enough to build something on it.)
By the by, there's a good chance that AI models are not copyrightable under US law; making the license moot in the US. In other regions, such as the EU, it likely holds.
3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use non-commercially. Notwithstanding the foregoing, NVIDIA Corporation and its affiliates may use the Work and any derivative works commercially. As used herein, “non-commercially” means for non-commercial research and educational purposes only.
i know that AI output is not copyrighteable, because it wasn't made by a human.
however the model itself is a product of a shit ton of work. and I doubt any court will claim them non copyrighteable.
That's not how US copyright works.
My idea would be to slightly modify / fine-tune a model and then redistribute that modified version. And claim the same Fair Use, AI companies use to take people's copyrighted work. Either that makes it Fair Use as well, or the "no originality required" collapses or the entire business model.
People do that all the time.
I don't see how that would be fair use or what the argument is supposed to be.
Let me warn you that Lemmy is full of disinformation on copyright. If you picked the idea up here, then it probably is absolutely bonkers.
In any case, fair use is a US thing. In the EU, it would still be yoink.
I think I used a bit too much sarcasm. I wanted to take a spin on the idea how the AI industry simultaneously uses copyright, and finds ways to "circumvent" the traditional copyright that was written before we had large language models. An AI is neither a telephone-book, nor should every transformative work be Fair Use, no questions asked. And this isn't really settled as of now. We likely need some more court cases and maybe a few new laws. But you're right, law is complicated, there is a lot of nuance to it and it depends on jurisdiction.
Alas, we have reached the max comment depth. I cannot reply to your latest comment.
Well, there is a distinction between use and obtaining it. For stealing, the use doesn’t matter. For later use, it does. That’s also what licenses are concerned with.
I see what you mean now. It's tricky. It's just another way in which copyright talking points cause problems.
You're saying that using/copying something you have in a database for AI training should always be legal. However, copying something to add it to the database should be judged as if it was done for enjoyment. EG everyone who torrents a movie should be treated the same, regardless of purpose. This will certainly cause problems for some scientific datasets.
Whether you downloaded a legal copy depends on whether the party offering the download had the right to do so. Whether that is the case may not be apparent. The first question is: What duty does someone have to check the provenance of content or data?
Torrents of current movies and the like are very obviously not authorized. For older movies, that becomes less clear. The web contains much unauthorized content. For example, the news stories that people copy/past on Lemmy. What duty is there to determine the copyright status of the content before using such data?
When researchers and developers share datasets, what duty do they have to check how the contents were obtained by whoever assembled it?
What happens when something was wrongly included in a dataset? Is that a problem only for the original curator, or also for everyone who got a copy?
What about streams, live TV, radio, and such things? Are you allowed to record those for training or not?
While Fair Use is a broad limitation/exemption, it’s still concerned with specific exemptions.
That's not quite right. Ultimately, Fair Use derives from the US Constitution; from the copyright clause but also freedom of speech. Copyright law spells out 4 factors that must be taken into account. But courts may also consider other factors. There is also no set way in which these factors have to be weighed. It's very open.
Well, it is. In the United States, willful copyright infringement
There are minimum conditions before prosecution is possible. I think uploading can always be prosecuted.
No, copyright should be toned down. Preferably for regular citizens as well and not just the industry.
Well, over the last few decades it has only been going in the other direction.
How does this fit together with calling copyright infringement theft?
Let me make a suggestion. This is your real opinion. This is what you believe based on what you see. The rest is just slogans by the copyright industry, which you repeat without thinking. The problem is that you are basically shouting yourself down; your own opinion. The media, a big part of the copyright industry, puts these slogans out. Their lobbyists demand favors and harsher laws from politicians. And when the politicians look at what voters think, they hear these slogans. That's one thing I mean when I say the copyright industry defrauds us.
Airbus pays like 100x the price for the same set of nuts and bolts than someone else. A kitchen appliance for industrial use costs like 3x the price of an end user kitchen appliance. Because it’s more sturdy and made for 24/7 use.
Exactly, they don't pay more for the same thing. It's almost exclusive to the copyright industry.
People do have to pay more if they license a picture to show to their 20 million customers or use it in an advertising campaign, than I do for putting it up in the hallway.
Actually, even in the copyright industry, such terms are from universal. Of course, you will have to pay more for the right to make copies than for a single copy. And even more for the exclusive copyright. Those things are different. However, it's usually a flat fee. Can you figure out what economic reasons might exist for a creator being paid per copy or per viewer?
No exceptions, no licensing, no fees. This is strictly to avoid bad things like doxxing, ruining people’s lives…
"No exceptions" means, for example, that a LLM would not be able to answer questions about politicians, actors, musicians, maybe not even about historical figures.
You said that there should be a way that you can remove your personal data from the training set. That implies that an AI company can offer money in exchange for people not removing their data. That's basically a licensing fee, however it is framed.
On second thought, I believe many celebrities, business people, politicians, ... will gladly offer more training data that makes them look. They'd only remove data that makes them look bad. Sort of like how the GDPR works. Far from demanding a licensing fee, they'd pay money to be known by the AI.
I’ve told you how my server was targeted by Alibaba and it nearly took down the database. [...] But I’m prevented from exercising my rights.
I agree that the situation is far from ideal. But let me point out that you do not have a right to other people's computer services. That's the issue with Alibaba hitting your server, right? It's a difficult issue. Mind that an opt-out from AI training does not actually address this.
This application of Fair Use is in favour of the feudal lord companies and to the detriment of the average person.
How so?
Alas, we have reached the max comment depth
Oh, wow.
I mean for some questions, we already have an old way of doing it and it's relatively straightforward to apply it:
Can you figure out what economic reasons might exist for a creator being paid per copy or per viewer?
Selling/Buying something is a very common form of contract. In our economy, the parties themselves decide what's in the contract. I can buy apples, cauliflower or wood screws per piece or per kilogram. That's down to my individual contract between me and the supermarket (or hardware store) and nothing the government is involved in. It's similar with licensing, that's always arbitrary and a matter of negotiation.
What happens when something was wrongly included in a dataset? Is that a problem only for the original curator, or also for everyone who got a copy?
Of course for everyone. If I download a torrented copy of a Hollywood film, that's not "healed" by it being a copy of a copy. It's still the same thing.
It's due diligence. Especially once someone uses (or publishes) something. And it very much depends on circumstances. Did they do it deliberately, specifically ignoring being in violation of something? If they were wrongfully under the assumption it was a legal copy, then it's more analogous to fencing. They're not in trouble for stealing anymore, but can be ordered to let go of the stolen goods. I'd say that's pretty much the same liability as with other things. I kill someone with my car. Now the question is have I been neglectful? Did I know the brakes were faulty but I didn't repair them and used the car nonetheless? Or did the car manufacturer mess up? There might be a case against me, or the manufacturer or both. And both civil and criminal law can be involved in different ways.
When researchers and developers share datasets, what duty do they have to check how the contents were obtained by whoever assembled it?
I'd do it like with shipments in the industry. If you receive a truck load of nuts and bolts, you take 50 of them out and check them before accepting the shipment and integrating the lot into your products.
Whether you downloaded a legal copy depends on whether the party offering the download had the right to do so. Whether that is the case may not be apparent.
Though that is very hypothetical. If the torrent has annas-archive or libgen.is in the title... It's pretty obvious. And that was what happened here. They did it deliberately and we know they knew.
And this week the next lawsuit started, alleging they (Meta) uploaded tons of porn videos (illegally) to be able to download what they were interested in, since Bittorrent has a tit for tat mechanism.
So I believe we first need to address the blatant piracy before talking about hypothetical scenarios. I believe that's going to be easier, though. I proposed to mandate transparency with what a company piled up in a dataset. One of the reasons was to address this. Like with the DMCA and GDPR, this could be a relatively simple mechanism where the provider (or company) gets some leeway, since it indeed is complicated. People will get a procedure to file a complaint and then someone can have a look whether it was wrongly included.
You said that there should be a way that you can remove your personal data from the training set. That implies that an AI company can offer money in exchange for people not removing their data. That's basically a licensing fee, however it is framed.
I wasn't concerned with copyright here. Let's say I'm politically active and someone leaks my address and now people start showing up, throwing eggs at my front door and threatening to kill me. Or someone spreads lies about me and that gets ingested. Or I'm a regular person and someone posted revenge porn of me. Or I'm a victim of a crime and that's always the fist thing that shows up when someone puts in my name and it's ruining my life. That needs to be addressed/removed. Free of charge. And that has nothing to do with licensing fees for content or celebrities. When companies use data, they need to have a complaints department and that will immediately check whether the complaint is valid and then act accordingly. There needs to be a distinction between harmful content and copyright violations.
Ultimately, Fair Use derives from the US Constitution; from the copyright clause but also freedom of speech. Copyright law spells out 4 factors that must be taken into account. But courts may also consider other factors.
Thanks for explaining. I didn't know those were only guidelines. But it makes sense and that's generally different between common law countries and whatever we are called, civil law countries?
Exactly, they don't pay more for the same thing. It's almost exclusive to the copyright industry.
And that is for a good reason. Generally physical things can't be copied easily. So handling copying isn't really necessary with physical goods. That's kind of in the word "copyright". Though when licensing for example immaterial goods, you're also buying a different license and different rights, and not the same thing.
Maybe think more in terms of services and licensing, since that's the main point here. In the material world that'd be something like the difference between renting an excavator for 2 weeks or buying the same one. It'll be exactly the same excavator I get. It's going to be a very different number on the bill, and I get different rights and obligations.
Of course, you will have to pay more for the right to make copies than for a single copy. And even more for the exclusive copyright. Those things are different.
Sure. Since I grew up with the German model, I'd open yet another category for AI training so it can be handled specifically. I mean it doesn't really fit into anything existing. AI is neither making copies, nor a copy, but still it uses it. And it's also not an art form or a citation. So I need a good argument why it needs to be mushed together with something else.
And datasets and model-weights are yet different things. Since we agree that AI training is transformative, we can confine copyright to the datasets and it's not much of an issue with the learned model weights. Or at least it shouldn't be. And I mean we have enough other issues to deal with that arise from the models itself.
"how my server was targeted by Alibaba" I agree that the situation is far from ideal. [...] It's a difficult issue. Mind that an opt-out from AI training does not actually address this.
I think you underestimate the consequences. The AI Fair Use plus the illegitimate scraping lead to a quite substantial war on the internet. Now every entity is fighting for their own. People like me are at the bottom of the chain and we have to protect our servers simply so they don't burn down. Big content platforms wage war as well. They don't want "their" content to be scraped. Leaving it open like before only cuts into their business. They rather sell it themselves. So they started making lots of things inaccessible by technical means, and combat the freedom we had before.
And that's the conundrum. In practice, this leads to the opposite. My own Fair Use of content (and that of other normal people and smaller businesses) is collateral damage. I used to archive some videos and I run a PeerTube instance. And now Google blocks all datacenter IPs, so you can just watch Youtube from a residential internet connection. They introduced rate-limiting. Reddit's API debacle in 2023 was largely about this. Countless other services and platforms have become enshittified due to this. And many more will.
Idk if the average consumer already notices. But it's really bad once you look under the hood. And this is not sustainable. And benefactors of this war are mostly big companies. Like Reddit, who found a way to make profit off of it. And Cloudflare, who were way too big of a dominating central internet instance before, and now they're the arms dealer in the war against scraping and that makes them even bigger.
All the while the internet gets more locked down, enshittified... And everyone who isn't the big content industry or already a monopolist, loses.
"favour of the feudal lord companies and to the detriment of the average person" [...] How so?
See my text above. Even if it was a nice idea, it leads to the opposite in the real world. A few big internet companies "win" in this war with technology, disregarding the idea behind the law, and everyone exept them loses. Cementing monopolies, not helping with them.
And more generally because most AI companies are billion dollar companies, they own half the internet ("land"), and a random nonfiction book author is a random individual with a very moderate income. And Fair Use now says the labour of the small guy is free of charge for the big company.
This is what you believe based on what you see. The rest is just slogans by the copyright industry, which you repeat without thinking. The problem is that you are basically shouting yourself down; your own opinion.
Ultimately, I'm not set on any ideology here. I'm regularly more concerned with making things work. And that's my goal here, too. I want a world that includes the existence of books and TV shows. So I need people to do that job. Now jobs can't be done if the people doing it starve.
Copyright is just a tool trying to achieve that. And kind of an half-way obsolete one with a lot of negative side-effects. I'm not set on it. We just need a way so books and TV shows are still a thing in 20 years. And that's my concern here and why I talk a lot about labour involved, and never about how they deserve to get rich if they're popular or if they manage the stuff.
And I see roughly 3 options for the future: a) Nobody pays them, or b) people who make use of their labour pay them, or c) some people pay, some get a free pass.
And the way I see it is a) is a future where quality and professional content is likely going to vanish big scale. And I'm not sure if the exact pre-copyright model applies to our modern world. Things have changed. For example copying things was an expensive process back then and required very expensive machinery. When it's done at no cost and by everyone in the digital age. b) is what I'm advocating for. Everyone needs to pay. Preferrably not every taxpayer, but people who actually use the stuff. And c) is what I called a "subsidy" when everyone gets to use it but only a group of people pays for everyone.
I mean what's your idea here? I can't really tell. Let's say we're not set on copyright. How do $90,000 arrive at a book author each year so it's a viable job and they can create something full time? And I'd like a fair solution for society.
I'm changing the order some, because I want to get this off my chest first of all.
Ultimately, I’m not set on any ideology here. I’m regularly more concerned with making things work. And that’s my goal here, too.
That's not what I'm seeing. Here's what I'm seeing:
I wasn’t concerned with copyright here. Let’s say I’m politically active and someone leaks my address and now people start showing up, throwing eggs at my front door and threatening to kill me. Or someone spreads lies about me and that gets ingested. Or I’m a regular person and someone posted revenge porn of me. Or I’m a victim of a crime and that’s always the fist thing that shows up when someone puts in my name and it’s ruining my life. That needs to be addressed/removed. Free of charge. And that has nothing to do with licensing fees for content or celebrities. When companies use data, they need to have a complaints department and that will immediately check whether the complaint is valid and then act accordingly. There needs to be a distinction between harmful content and copyright violations.
First, you start out with a little story. Remember my post about narratives?
You emphasize what "needs" to be achieved. You try to engage the reader's emotions. What's completely missing is any concern with how or if your proposed solution works.
There are reputation management companies that will scrub or suppress information for a fee. People who are professionally famous may also spend much time and effort to manipulate the available information about them. Ordinary people usually do not have the necessary legal or technical knowledge to do this. They may be unwilling to spend the time or money. Well, one could say that this is alright. Ordinary people do not rely on their reputation in the same way as celebrities, business people, and so on.
The fact is that your proposal gives famous and wealthy elites the power to suppress information they do not like. Ordinary people are on their own, limited by their capabilities (think about the illiterate, the elderly, and so on).
AIs generally do not leak their training data. Only fairly well known people feature enough in the training data so that a LLM will be able to answer questions about them. Having to make the data searchable on the net, makes it much more likely that it is leaked with harmful consequences. On balance, I believe your proposal makes things worse for the average person while benefit only certain elites.
It would have been straightforward to say that you wish to hold AI companies accountable for damage caused by their service. That's the case anyway; no additional laws needed. Yet, you make the deliberate choice to put the responsibility on individuals. Why is your first instinct to go this round-about route?
Selling/Buying something is a very common form of contract. In our economy, the parties themselves decide what’s in the contract. I can buy apples, cauliflower or wood screws per piece or per kilogram. That’s down to my individual contract between me and the supermarket (or hardware store) and nothing the government is involved in. It’s similar with licensing, that’s always arbitrary and a matter of negotiation.
But market prices aren't usually arbitrary. People negotiate but they usually come to predictable agreements. Whatever our ultimate goals are, we have rather similar ideas about "a good deal".
I’d do it like with shipments in the industry. If you receive a truck load of nuts and bolts, you take 50 of them out and check them before accepting the shipment and integrating the lot into your products.
All very reasonable ideas. Eventually, the question is what the effect on the economy is, at least as far as I'm concerned.
These tests mean that more labor and effort is necessary. Mistakes are costly. These costs fall on the consumer. The big picture view is that, on average, either people have less free time because more work is demanded, or they make do with less because the work does not produce anything immediately beneficial. So the question is if this work does lead to something beneficial after all, in some indirect way. What do you think?
So I believe we first need to address the blatant piracy before talking about hypothetical scenarios.
No. That is the immediate hands-on issue. As you know, the web is full of unauthorized content.
All the while the internet gets more locked down, enshittified… And everyone who isn’t the big content industry or already a monopolist, loses.
Well? What's your pitch?
See my text above. Even if it was a nice idea, it leads to the opposite in the real world. A few big internet companies “win” in this war with technology, disregarding the idea behind the law, and everyone exept them loses. Cementing monopolies, not helping with them.
That is not happening, though?
And Fair Use now says the labour of the small guy is free of charge for the big company.
You compare intellectual property to physical property. Except here, where it becomes "labor". I don't think you would point at a factory and say that it is the owner's labor. If some worker took some screws home for a hobby project, I don't think you would accuse him of stealing labor. Does it bother you how easily you regurgitate these slogans?
I mean what’s your idea here? I can’t really tell. Let’s say we’re not set on copyright. How do $90,000 arrive at a book author each year so it’s a viable job and they can create something full time? And I’d like a fair solution for society.
Good question. That's an economics question. It requires a bit of an analytical approach. Perhaps we should start by considering if your idea works. You are saying that AI companies should have to buy a copy before being allowed to train on the content. So: How many extra copies will an author sell? What would that mean for their income?
We should probably also extend the question beyond just authors. Publishers get a cut for each copy sold. How many extra copies will a publisher sell and what does that mean for their income?
Actually, the money will go to the copyright owner; often not the same person as the creator. In that way, it is like physical property. Ordinary workers don't own what they produce. A single daily newspaper contains more words than many books. The rights are typically owned by the newspaper corporation and not the author. What does that mean for their income?
First, you start out with a little story. Remember my post about narratives?
[...]
You emphasize what "needs" to be achieved. You try to engage the reader's emotions. What's completely missing is any concern with how or if your proposed solution works.
I think you're a bit too focused on narratives. I mean how am I supposed to share my perspective without sharing my perspective? Of course that's going to include stories about bad things that happened to me. I've handled some privacy and personal information related issues for not so tech-savy people. You should feel privileged if you didn't have a lot of bad or complicated things happen to you, but I can assure you there are ordinary people with different stories. I didn't handle death threats, but there were some other legitimate reasons, from simple job related to bad and disgusting. And we can't just throw those people under the bus and say »yeah, your well-being just cuts into profit«...
This isn't copyright, so I'm going to move on. But this goes hand in hand with other regulations for datasets and online services.
AIs generally do not leak their training data.
Well, if I ask them about events and organizations I was part of, AI does seem to know details. And those were small and local things. No celebrities involved. AI however hallucinates a lot and >80% of names or details are currently made up. I bet AI is going to become better, though. It's definitely already able to connect some lesser-known names.
(There is some science here: https://arxiv.org/pdf/2506.17185 )
People negotiate but they usually come to predictable agreements. Whatever our ultimate goals are, we have rather similar ideas about "a good deal".
Might be the same here. Maybe the free market will arrive there after things settled down. You're right, the content industry is a shitty corner of the market. I'd like to mention Spotify as precedent, who are able to license pretty much all important music, despite paying next to nothing to artists. Or my university library, who were able to stock pretty much all important books for their students. This might be achievable in some way for AI, too. Other businesses seem to be able to obtain special licenses for use-cases other than be a regular customer.
These tests mean that more labor and effort is necessary. Mistakes are costly. These costs fall on the consumer. [...] So the question is if this work does lead to something beneficial after all, in some indirect way. What do you think?
No one promised it has to be easy. Other products also cost some extra because we have some minimum requirements. For food safety, cars, fair rides... I wouldn't want to do away with that, so I think this always leads to something beneficial. We just need to strike a balance. Every now and then a rollercoaster crashes and people die. Nothing is perfect. We collectively decide what rate of rollercoaster crashes we deem acceptable. And then the experts write some regulations to achieve that.
"All the while the internet gets more locked down, enshittified… And everyone who isn’t the big content industry or already a monopolist, loses." Well? What's your pitch?
Pretty much what I'm arguing for, here. Discard the idea which causes it. It's obviously not working. Likely because it's too simplistic.
That is not happening, though?
As I told before, it already happened. Three years ago I and any independent researcher was able to use the Reddit API and use Youtube. Now we're not. And the monopolists struck deals amongst themselves. Ever wondered why many more paywalls popped up with news outlets lately? Cloudflare and Anubis checks before a page loads? You get locked out of codeberg for 24h+ and can't update your server? Your alt account gets deactivated for "suspicious activity"? That's all indications something has happened behind the scenes. And it achieves the desired effect. More and more information is now under tighter control. For the AI companies and for everyone.
And all of this happened to me, along with me needing to do similar things since they also showed up at my front door. The rate of this happening correlates perfectly. And from personal experience and talking to other admins, I know bots and scraping are the cause.
You compare intellectual property to physical property. Except here, where it becomes "labor". I don't think you would point at a factory and say that it is the owner's labor. If some worker took some screws home for a hobby project, I don't think you would accuse him of stealing labor. Does it bother you how easily you regurgitate these slogans?
What slogan? And what hobby project? ChatGPT certainly isn't a hobby project. That thing costs some 3 digit millions of dollars per iteration. And they're also not taking a few screws. They're the employee who takes one screw out of each other packet and with the throughput, they have a nice side-business with the screws.
I was trying to make a point here: Take away copyright since we both don't like it... Now what remains? I think the labour of the author.
And since we're always discussing feudalism and a monopoly... Am I right here and that's the AI industry, or did I miss something? In my eyes, we're currently at Google (which is a monopolist), Microsoft (another monopolist) and the other 51% of OpenAI which seem very well off, we have Apple (I think also monopolist, and they're also in the top 10 richest companies). Nvidia does AI and they're torpedoed to top market cap by AI and have monopoly-like margins. Then we have Meta and Elon Musk's companies in the business and also valued a trillion dollars. Then we have "startups" funded by public money from the Chinese government. Anthropic (interestinly enough now sued by Reddit for scraping their data), Elevenlabs, and in Europe: Mistral, Stability.ai and Black Forest Labs. (And a few other players like Standford and other universities, smaller companies/startups and quite an active fine-tuning community.)
That's pretty much what I read about. Many of them are just the richest companies on planet earth. Several of them are monopolists. Some happen to be the ones who own the big platforms that make up the internet. So if we now say AI training supplies needs to be cheaper, whether that's right or wrong... You know who 90% of that benefit goes to? ...Them.
And that's not wrong. They have a legitimate business and it's not wrong to make money selling GPUs or AI. It's just that you can't say you're against feudalism and monopolies, and then devise a rule and the list of the main benefactors is just a list dominated by monopolies and feudalism from before. There is some desired outcome but that's just among the also-rans.
That's just you being against monopolies where it suits you and you're completely oblivious to them in other areas. En large, probably enabling them.
Now the content industry is bad as well. And we find Disney, Warner Bros, Nexflix in the list of Fortunate 500 companies. Seems the publishing houses aren't even amongst them. And now you want to redistribute resources and the main chunk moves up the chain to the select top. Most of them have several ruling against them for having (for example) devised ecosystems to arrive at a monopoly and then subsequently abuse the powers that come with it. You didn't level the playing field but we can tell from the last few years and how AI law of the USA turned out, you mainly helped the big companies and monopolists. And we can have a look at the financial figures and they're mostly doing record profit since Covid while that's not the case for average economy. Now who do we seem to funnel value towards in practice? And why do these companies by large happen to be identical to the internet feudalism from before gen-AI?
Good question. That's an economics question. It requires a bit of an analytical approach. Perhaps we should start by considering if your idea works.
Well, I'm open to other ideas than mine. I mean you propose a clear solution here: Fair Use. Now I would have expected you to have analysed the situation and have some solution on how that content is supposed to get there. I mean it's not created out of thin air. And the other side of the coin has to be factored in as well once we're talking about introducing laws.
I think the entire content industry isn't a healthy model. (Edit: And I'm not so focused on the middle-men and the resulting content owners, that business model is indeed shady. But content still has to be created.) And the average individuals working there aren't well off. And it doesn't seem like we're on a path where this is going to improve in the future. So there aren't any "extra copies" when it gets to these people. That's mainly a thing for copyright owners. The creators don't necessarily have gifts to hand out.
In some cases we already know AI directly takes away. Freelancers, like illustrators, maybe musicians... Without an industry and other entities in between, they're the first who get somewhat fed upon and the same thing directly takes away their business opportunities. And it's the combination of the two which makes it bad.
So what's with content in the early 21st century and in the upcoming age of AI? Is it as easy as leave everything as is and slap Fair Use on top? Does that solve a single issue with anything? Or is that just supposed to make business cheaper for some AI companies with a random effect on everyone else? Do they contribute something of value and how does that compare to the negative side-effects and the main thing they do and that is accumulate wealth for themselves? Does Fair Use even work or have companies kind of already turned it into the opposite in practice by (ab)using their power?
There is a lot of disinformation being spread on copyright because major rights-holders hope to gain a lot of money for nothing.
US fair use has always worked like this. Other countries without fair use had to make laws to enable AI training. I know about Japan and the EU.
It is precisely because of these new laws that AI training in the EU is possible at all (eg by Mistral AI or by various universities/research institutions). But because of lobbying by rights-holders, this is quite limited. It's not possible to train AIs in the EU that are as capable as those from the US, where Fair Use comes directly from the constitution and can't be easily lobbied aside by monied interests.
Hmmh. It's a bit complicated. "Fair Use" is a concept in Common law countries, but lots of European countries do it a bit differently. We here in Germany need specific limitations and exceptions from copyright. And we have some for art and science, education, personal use and citations and so on. But things like electronic data transfer, internet culture and more recently text- and datamining needed to be added on top. And even datamining was very specific and didn't fit AI in it's current form. And we don't have something like Fair Use to base it upon.
From my perspective, I'm still not entirely convinced Fair Use is a good fit, though. For one it doesn't properly deal with the difference of doing something commercially and for research or personal use, and I believe some nuance would help here, big rich companies could afford to pay something. And as AI is disruptive, it has some effect on the original work and balancing that is somehow part of Fair Use. And then the same copyright concept has higher standards for example in music production and sampling things from other songs that are recognizable in the resulting work. And I don't think we have a clear way how something like that translates to text and AI. And it can reproduce paragraphs, or paint a recognizable Mickey Mouse and in some way it's in there in the model and leads to other issues. And then all the lines are blurry and it still needs a massive amount of lawsuits to settle how much sounding like Scarlett Johansson is too much sounding like her... I'd say even the US might need more clarity on a lot of legal questions and it's not just handled by Fair Use as is... But yeah, "transformative" is somewhat at the core of it. I can also read books, learn something and apply the knowledge from it. Or combine things together and create something new/transformative.
Of course, it would be better if governments would pass sensible laws on AI training. These lawsuits are a complete waste. But you can see the problem in Europe. The copyright industry has too much power. You don't get good laws. (In fairness, Japan did pretty well.)
For one it doesn’t properly deal with the difference of doing something commercially and for research or personal use, and I believe some nuance would help here,
That needs to be considered in fair use, but I don't see what difference it would make here.
big rich companies could afford to pay something.
That's a line by the copyright lobbyists. But economics doesn't work like that.
In a competitive market, producers must pass on costs. EG coffee and cocoa beans have become more expensive on world markets in the last year, so now coffee and chocolate are more expensive in stores.
AI is quite competitive. If AI firms are forced to pay license fees, then AI subscriptions will become more expensive for consumers. The money goes straight from everyone to rights-holders; a few people at the top.
Sure. I mean we're a bit different at both sides of the Atlantic. Europe regulates a lot more. We're not supposed to be ripped off by big companies, they're not supposed to invade our privacy, pollute the environment unregulated... Whether we succeed at that is a different story. But I believe that's the general idea behind social democracy and the European spirit. We value our freedom from being used and that's also why we don't have a two weeks notice and we do have regulated working hours and a lot of rules and bureaucracy. The US is more freedom to do something. Opportunity. And in my eyes that's the reason why it's the US with a lot of tech giants and AI companies. That just fosters growth. Of course it also includes negative effects on society and the people. But I don't think "right" and "wrong" are fitting categories here. It's a different approach and everything has consequences. We try to balance more, and Europe is more balanced than the US. But that comes at a cost.
That's a line by the copyright lobbyists [...]
Well, I don't think there is a lot of good things about copyright to begin with. Humanity would be better off if information were to be free and everyone had access to everything, could learn, remix and use and create what they like.
I think of copyright more as an necessary evil. But somehow we needed Terry Pratchett to be able to make a living by writing novels. My favorite computer magazine needs to pay their employees. A music band can focus on a new album once they get paid for that... So I don't think we need copyright in specific. But we need some way so people write books, music etc... Hollywood also did some nice movies and tv shows and they cost a lot of money.
I don't have an issue with AI users paying more. Why should we subsidise them, and force the supply chain to do work for a set price? That's not how other businesses work. The chocolate manufacturer isn't the only one making profit, but an entire chain from farmer to the supermarket gets to take part in earning money, which culminates in one product. I don't see why it has to be handled differently for AI.
And what I like about the approach in Europe is that there is some nuance to it. I mean I don't agree 100% but at least they incentivise companies to be a bit more transparent, and they try to differentiate between research to the benefit of everyone and for-profit interest. And they try to tackle bad use-cases and I think that's something society will appreciate once the entire internet is full of slop and misinformation by bad actors. Though, I don't think we have good laws for that as of now.
I know those narratives, as the humanities people call this. I don't know if you know the term. You know commercials. They rarely give you facts. They don't give you technical data about performance, durability, or such. Usually, a commercial is a little story, maybe just a few nice people having fun. When you see the product and think about buying, you can see yourself living that story. Maybe you see yourself in a new car speeding unhindered down an empty road; not stuck in traffic like those suckers you see every day in reality.
You don't convince people with facts. You use psychological manipulation. If you think about history, people mostly believed religious stories about what happened in the world. That many people in developed countries defer to scientific facts is unusual. Of course, many don't. The stories are much nicer. Let's face it: The only reason we put up with ugly, meaningless facts is because we are reliant on technology.
We want the good life. We want to be healthy, and not having to worry about food or shelter. We want comforts, like flowing hot and cold water; an extravagant luxury for most humans throughout history and even today. In war, we want the best weapons, so that it is the other guys who do the dying.
So the question is: Do you prefer the feel-good-story or do you want a society that works for everyone?
You cannot have both.
I'm a bit in the science/facts bubble. I mean sure, advertisements and narratives are effective, and I'm not exempt. But I'd like to know the truth. And have politics based on scientific evidence. The goal is to strive and have a nice life, eveyone should be happy if possible. And then we use science to tell what kind of laws we need. Are all students delegating their homework to ChatGPT and they don't learn anything anymore? Find ways so school achieves it's goal. Do we confuse reality and fiction? Find ways to mitigate for that, e.g. watermarking. Do we loose all artists and creative people? Find ways so they can be part of society... I mean sometimes we can have a cake and eat it too, especially with technology. But we need to be clever.
I mean in the past we've adopted to new technology. One example which is often cited in context with AI is channging from horses to cars. That was very disruptive as well. I think today's situation is a bit different. And for example copyright barely works in the digital age. But AI is likely going to have a massive impact on society. Maybe we need to re-think capitalism. That's not necessarily good or bad or a "narrative". But somehow things need to be addressed.
Europe has clearly chosen a path that will increase its technological dependency on either the US or China. It's not likely to play a large role in figuring out the future economic order. We'll see how long it can continue on this path.
Its AI policies are reminiscent of Feudalism. People create AI, but then they have to pay a levy to people who have contributed nothing. But they have rights awarded by the government. AI is not the only area where the EU is shifting to policies that facilitate wealth extraction rather than creation. I don't think that is domestically sustainable. Sooner or later the European nations will try to extract wealth from each other and that will be the end. It doesn't have to go that far. Maybe we will just see a stagnation and decline, as in South America.
Is your stands limited to AI or do you generally condone paying a levy? Like towards Spotify or Netflix or Hollywood, because I could as well skip that and watch the newest movies without obeying their copyright...
I mean it's not nothing, there is some effort people put into things. Like the Wikipedia is super useful for machine learning. My computer code on Github teaches AI programming. And I can see the crawlers at my own server and today I had to update my config because it's been hammered by Alibaba. Dozens of different IP addresses, fake user agent and they completely overloaded my database with requests. It's not like I don't contribute or am part of a different world?!
That's a bit of an odd question, given my praise of American Fair Use. The USA has had copyright, including Fair Use, for longer than much of Europe. The predecessor of modern copyright law was created in the 1700s in the UK. There is a German scholar, Eckhard Höffner, who argues that this caused book production to plummet in the UK. He also says that the German-speaking lands produced more books, more different books, than the UK in the century before such laws arrived.
The American founding fathers were men of the Enlightenment. They, or some of them, understood the problems with such government sponsored monopolies. Therefore, the US Constitution limits copyrights and patents. It's an interesting clause. Congress is empowered "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries". It's about progress first; very much a product of the Enlightenment.
I don't know if there was ever a discussion if entertainment should qualify at all for copyright protection. I have to try to look it up at some time.
In 1998, US copyright was extended by 20 years. Now it is life of the author +70 years. That has been called the Mickey-Mouse-Protection-Act, because it meant that the original Mouse enjoyed another 20 years of copyright This was roundly criticized by economists and even lead to a case before the Supreme Court. Obviously, making copyright retroactively longer does not encourage any kind of creativity. It's in the past. Well, the case was lost, nevertheless.
For many left/liberal people, this is corruption; just the Disney company getting what it wants.
The EU countries had expanded their copyright years earlier, without resistance or even comment. Smug Europeans may feel superior when Americans rage against the corporations. But the truth is often like this, where Europeans simply quietly accept such outrages.
The original copyright in the US (and before that in the UK) was 14 years. Copyright protection required registration. It worked like the patent system. The interesting thing is that patents still work a lot like that. One must register and publish them and then they last for 20 years. Patents still have a 20-year duration. Meanwhile, copyrights have gone from 14 years to life+70 years, no registration required.
Patents are public so that people can learn from them. That has been used as an argument for patents. The alternative would be that everyone tries to keep new inventions secret. This way, people can learn and try to circumvent patents; find other ways of achieving the same thing. That's an interesting observation in light of AI training, no?
I haven't answered your question. In my experience, pro-copyright people will always refuse to argue over what should be covered by copyright or how long. They demand an expansion and use psychological manipulation to get it. If you do not let yourself be manipulated, they change the subject and will argue if copyright should exist at all. I have never met a single person who was able to defend copyright as it exists. Perhaps you can answer own question now.
Yes, I mainly wanted to rule out the opposite. Because the multi billion dollar companies currently do some lobbying as well. Including the same manipulation and narratives, just the other way around. They want everyone else to lose rights, while they themselves retain full rights, little to no oversight... And that's just inherently unfair.
As I said. Copyright might not be something good or defendable. It clearly comes with many obvious flaws and issues. The video you linked is nice. I'd be alright with abolishing copyright. Preferrably after finding a suitable replacement/alternative. But I'm completely against subsidising big companies just so they can grow and manifest their own Black Mirror episode. Social scoring, making my insurance 3x more expensive on a whim and a total surveillence state should be prohibited. And the same rules need to apply to everyone. Once a book author doesn't get copyright any longer, so does OpenAI and the big tech companies. They can invest some $100 million in training models, but it's then not copyrighted either. I get to access the model however I like and I can sell a competing service with their model weights. That's fair and same rules for everyone. And Höffner talks to some degree about prior work and what things are based upon. So the big companies have to let go of their closely guarded trade secrets and give me the training datasets as well. I believe that'd be roughly in the spirit of what he said in the talk. And maybe that'd be acceptable. But it really has to be same rules for everyone, including big corporations.
They want everyone else to lose rights, while they themselves retain full rights, little to no oversight
Can you back this up? They certainly do not have the same reach or influence as the copyright industry.
subsidising big companies
What do you mean by that?
They can invest some $100 million in training models, but it’s then not copyrighted either.
AI models may not be copyrightable under US law. I'm fairly sure that base models aren't. Whether curating training data, creating new training data, RL, and so on, ever makes a copyrighted model is something that courts will eventually have to decide.
They are probably copyrightable under EU law (maybe protected as databases). That's an EU choice.
But it really has to be same rules for everyone, including big corporations.
The rules are different in different countries. They are not different for corporations.
Can you back this up?
The current thing is Meta is very vocal about the EU AI act. Their opinion is everywhere in the tech news, this week. And they're a very influential company. Completely dominating some markets like messengers, parts of social media. Also well-known in the AI industry.
Other companies do the same. They test what they can get away with all the time. Like stealing Scarlett Johansson's voice, pirating books on bittorrent... And they definitely have enough influence and money to pay very good lawyers. Choose what to settle out of court and what to fight. We shouldn't underestimate the copyright industry. But Meta for example is a very influential company with a lot of impact on society and the world.
And AI is in half the products these days. Assisting you, or harvesting your data... Whether you want it or not. That's quite some reach, pervasive, and those are the biggest companies on earth. I'd be with you if AI were some niche thing. But it's not.
And Meta are super strict with trademark law and parts of copyright when it's the other way around. I lately spent some time reading how you can and cannot use or mention their trademark, embed it into your website. And they're very strict if it's me using their stuff. The other way around they want free reign.
subsidising big companies [...] What do you mean by that?
I mean manifacturing a supply chain for them where they get things practically for free. Netflix has to pay for licenses to distribute Hollywood content. OpenAI's product also has other people's content going into the product, but they don't need to do the same. It's subsidised and they get the content practically for free for their business model.
And what do you think I do with my server and the incident last week? If I now pay $30 more for a VPS that's able to withstand Alibaba's crawlers... Wouldn't that be a direct sunsidy from me towards them? I pay an extra $30 a month just so they can crawl my data?
AI models may not be copyrightable [...] // They are probably copyrightable [...]
We were talking about a specific lecture that questions the entire concept of copyright as we have it now. You can't argue to abolish copyright and then in the next sentence defend it for yourself or your friends. It's either copyright for book authors and machine learning models, or it's none of them. But you can't say information in the products from other people is not copyright, but the information in the products of AI companies is copyright. That doesn't make any sense.
The current thing is Meta is very vocal about the EU AI act.
And they're not wrong.
That doesn't quite back up what you claimed, though. You wrote: "They want everyone else to lose rights, while they themselves retain full rights,"
Their claim of Fair Use seems straightforward. That's not everyone else losing their rights. I am not aware where they lobby for "full rights" for themselves, whatever that means.
And Meta are super strict with trademark law and parts of copyright when it’s the other way around. I lately spent some time reading how you can and cannot use or mention their trademark, embed it into your website. And they’re very strict if it’s me using their stuff. The other way around they want free reign.
There are different kinds of intellectual property. Trademarks are different from copyright. Then there's also trade secrets, patents, publicity rights, privacy, etc.
Generally, you can use any Trademark as long as you don't use it for trade or harm the business that owns it. I'm not going to look it up but I'm guessing that the rules are around not giving a misleading impression of your page's relationship with Meta.
As for copyright, when you are in the US you can make Fair Use of their materials, regardless of what the license says.
That you can't do that in Europe is not Meta's fault.
I mean manifacturing a supply chain for them where they get things practically for free.
Oh. You're talking about Net Neutrality and not copyright. I'm afraid I don't know enough about the network business to form an opinion on that.
I don't think what happened to you was a subsidy, though. You're offering something for free, and apparently Alibaba took advantage of you for that. That's just how it is, sometimes.
We were talking about a specific lecture that questions the entire concept of copyright as we have it now.
I touched on a lot of subjects. In a nutshell, I am against rent-seeking. No more, no less.
stealing Scarlett Johansson’s voice,
BTW, that turned out to be a false.
And they're not wrong.
That's correct. My point was that they're following an agenda as well. But they're correct that that signature has consequences and doesn't translate into unlimited corporate growth.
where they lobby for "full rights" for themselves, whatever that means
OpenAI is very secretive and not transparent at all. They promised to release a model which they've delayed several times now. But other than that, they don't write papers for some time now, they don't share stuff. And they do other small little things for their own benefit and so the competition can't do the same. They even go ahead and keep simple numbers like the model size a big trade secret. They guard everything closely and they like it that way. It's the literal opposite of free exchange of information. And they do that with most of their business decisions.
And Meta's model come with a license plus an EULA. And I've lost track of the current situation, but as an Europen I've been prohibited from downloading and using Meta's LLMs for some time. Sometimes they also want my e-mail address, I have to abide by their terms and I don't like the terms... That's their rights. And they're making use of them. It is not I can just download it and do whatever because that were Fair Use as well... They retain rights, and many of them.
Trademark is definitely part of the conversation. Can models paint a mickey mouse? other trademarked stuff? Sure they do. And it's the same trademark that protects fictional characters and other concepts. So once AI ingests that, it needs addressing as well. And it's not just that. They (Meta/Instagram) also address copyright and they also have a lot of rules about that. With that specific thing I was more concerned with their logo, though, and that is mostly trademark law.
You're talking about Net Neutrality and not copyright [...]
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
[...] and apparently Alibaba took advantage of you for that. That's just how it is, sometimes.
Yeah, that's kind of my point. They're taking advantage of people. And kind of in a mischevious way, because they've thought about how they can defeat the usual defenses. How do you think I'm supposed to deal with that? Let everyone take advantage of me? Take down my server and quit this place?
I am against rent-seeking. No more, no less
I'm with you on this. As long as it's fair. Make sure AI companies aren't rent-seeking either. Because currently that's big part of their business model.
I mean what do you think the big piles of information the gather for training are? That they don't share and do contracts and even buy up companies to get exclusive access... How they gobble up the resources? And how prices for graphics cards skyrocket first due to crypto and then due to AI? That's kinda rent-seeking on several different levels...
Scarlett Johansson [...] that turned out to be a false
It's definitely inspired by her performance on "Her". Sam Altman himself made a reference, connecting his product and that specific movie. It's likely not a coincidence. And they kind of followed up and removed that voice along with a few others. Clearly not because they were right and this is an uncontroversial topic.
as an Europen I’ve been prohibited from downloading and using Meta’s LLMs for some time.
The vision models are not for the EU. Meta trained them on Facebook data. The EU did not allow that. Meta said that this would mean that their models would not have the necessary knowledge to be useful for European users, and disallowed their use in the EU. It also means that some EU regulations don't apply, but they did not give that as a reason, I think.
In any case, it seems quite fair to me. If Europe does not want to pitch in, but only makes demands, then why should it reap the benefits?
Some other recent open models by Tencent and Huawei are also not for the EU. That is in response to the AI Act. I am surprised that it is not a standard clause yet.
And they’re making use of them. It is not I can just download it and do whatever because that were Fair Use as well… They retain rights, and many of them.
No. They can't override fair use. That's the point of fair use. You cannot do what you like with it because you are in Europe and don't have fair use.
I really don't understand how that is supposed to make sense. You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It's ridiculous.
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
I don't see how that is about copyright.
Make sure AI companies aren’t rent-seeking either. Because currently that’s big part of their business model.
Back that up or retract the statement.
It’s definitely inspired by her performance on “Her”. Sam Altman himself made a reference, connecting his product and that specific movie.
What you are saying is that someone who sounds a bit like Scarlet Johanson must get permission from her to speak in public.
Maybe there is a language issue here. But from what you are writing, you are not against rent-seeking. You demand privileges and free money for special people; a new aristocracy. You even want privileges for Meta, even though you use these privileges as arguments why these privileges should exist. This is all absolutely ridiculous.
Here's rent-seeking in the German Wikipedia: https://de.wikipedia.org/wiki/Renten%C3%B6konomie
Back that up or retract the statement.
Let me rephrase it a bit: OpenAI is one of the prime examples. They wrote one or two scientific papers early on. And then they stopped. Deliberately. They're not contributing anything to science. All they invent is strictly for-profit and happens behind closed doors. They take, they don't contribute back.
And the main asset in the digital age is information. It's necessary for AI training to pile that up in a dataset. So that's their supply and they want it cheap because they need a lot of it. That's where they generate their "rent" from. Do they contribute anything back with that? No. They "seek" it and pile it up and that becomes their trade secret. And that's why I call them "rent-seeking". (Thanks for the Wikipedia article, yours was way better than the convoluted definition I read yesterday...) And it even translates to the illegal activities mentioned in the Wikipedia article. Meta has admitted to pirating books to pile up datasets faster. OpenAI likely did the same(?) It's just that they keep everything a secret. No company tells you anymore whether your content went into a dataset, since you might be able to use the legal system against them.
We can see that also with some platforms like Github, which turned out to be a great resource for AI training for Microsoft. Harvesting data is one of the main business models these days. And having that data is what pays the rent. It's not all there is to it. There's a lot of work in compiling it, curating datasets, RLHF... And then of course the science behind AI itself. But the last one aside, that's also often done with negative effects on society. We all know about the precarious situation of the data labellers in Africa.
And then all of this, plus the experts they get from the public universities and all the GPUs in the datacenters and some electricity get turned into their (OpenAI's) intellectual property.
You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It's ridiculous.
Maybe tell me what they contribute back? Is there anything they give? I don't think so. They mainly seem like parasites to me, freeloading on all the information they can gather in electronic form. And then? Is there anything we get in return?
And maybe we're having a small misunderstanding here. I'm not Anti-AI or anything. I just want people who take something from society, to contribute something back to society. And they really like to take, but they themselves painstakingly avoid disclosing the smallest little details.
I'd say there is two options. Either they do contribute back and we find a healthy relationship between society and big-tech AI companies. That'd make it completely fine if they also take things and it's give-and-take. Or they want to do a for-profit dubious service with no-one having a say in it or look inside or be able to use it aside from what they devised for society... But then the same rules apply to them. They then also have to contribute back in form of money to pay for their supplies and license the content that goes in to their product.
My own opinion: Allow AI and cater to scientific progress. In a healthy way, though. The companies do AI and they get resources. But they're obligated to transparency and contribute back. For example open-weight models are a good idea. I'd go further than that, because science and society also needs to address biases, what AI can be used for, and a bunch of issues that come with it. Like misinformation, spam... The companies aren't incentivised to address that. And it starts to show impact on the internet and society. And regulations are the way to make them do what's necessary or benefitial in the long run.
you are not against rent-seeking
I'm generally against hyper-capitalism and big corporations. They often don't do us any good. It's a bit complicated with AI since those companies are over-valued and there is a big investment bubble, which isn't necessarily about society. But the copyright-industry is part of the same picture. Spotify for example isn't healthy for society at all. And the Höffner video you linked had a lot of good points about that. I'm not sure whether you're aware of the other side of the coin... For example I've talked to some musicians (copyright holders) and I've written some few pages of technical documentation and I'm aware that it takes several weeks behind the desk to produce 40 pages. And like half a year or more to write a novel. And somehow you need to eat something during those months... So with capitalism it's not always easy. The current situation is sub par. And the copyright industry is mainly a business model to leech on people who create something. We'd be better off if we cut out the middle men.
I see. Thank you. I'm afraid you don't quite understand what rent-seeking means. Let me try a hypothetical example.
Food is pretty cheap. But suppose a single company had a monopoly on supplying food. How much would people be willing to pay? People would give almost anything they have.
The reason food is cheap, is because there is no monopoly. If someone charges more than the competition, you go to the competition. You get a market price. It's complicated but one thing that goes into the price of food is the cost of labor. Many people must work to supply food.
These workers could do other things with their time. But also, other people could do their work of supplying food. No one has a monopoly. Eventually, the cost of labor depends on how much money you must offer to people to be willing to put up with the work.
If someone had a monopoly on food supply, they could charge fantastic prices. Their cost would not change. The difference between the market price and the monopoly price is the monopoly rent.
Let's take this closer to AI training.
Let's say there's some guy who's searching through libraries and archives for stuff to digitize so that it can be sold to AI companies for training. He finds an archive of old newspapers. How much would the market price for scans of these newspapers be? Let's ignore copyright for now.
Maybe the potential buyer could send someone else to scan the papers. So our guy could only ask to be paid for the labor in scanning the papers.
So our guy will not say where he found that archive. That is his trade secret. The potential buyer would have to send someone to search for that archive and scan it. That means our guy can ask to be paid for his labor in finding the archive AND scanning it. The potential buyer will only hire someone else to do that if our guy asks too high a price.
There is a way our guy can get more. If he destroys all remaining copies of these newspapers, then he has a monopoly. Now he can ask for as much as the potential buyer is willing to pay. That's a monopoly rent.
Now copyright... Those newspapers are probably under copyright. If our guy is in Europe, he will have to get permission by the rights-holder to scan the papers. Copyright is a monopoly enforced by the state. The rights-holder can now extract the monopoly rent from our guy.
If the publisher has gone out of business, the rights-holders may be hard to find but he has to make the effort. In practice, this means that there is really no point in making the effort to preserve European culture and history. The copyright people don't just harm technological progress and the European economy, they harm European culture. That's parasitic.
You're making the argument that OpenAI and others are trying to get paid. That's not rent-seeking. Ideally, our laws ensure that seeking money makes you work for the benefit of other people.
Farmers work for money, and everyone else gets a lot of good, cheap food out of it. If you demand that farmers should work for free, then you're demanding that many of us should starve.
Sarcasm Identification
That sounds nice. Can we run it on a microcontroller so I can wear it as an amulet around my neck? It'd certainly help in everyday situations, once I forget sarcasm doesn't just work on random strangers and it'd light up and tell them I'm being sarcastic and not a weird idiot?!