762
submitted 10 months ago by L4s@lemmy.world to c/technology@lemmy.world

OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling's Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

you are viewing a single comment's thread
view the rest of the comments
[-] TropicalDingdong@lemmy.world 114 points 10 months ago

Its a bit pedantic, but I'm not really sure I support this kind of extremist view of copyright and the scale of whats being interpreted as 'possessed' under the idea of copyright. Once an idea is communicated, it becomes a part of the collective consciousness. Different people interpret and build upon that idea in various ways, making it a dynamic entity that evolves beyond the original creator's intention. Its like issues with sampling beats or records in the early days of hiphop. Its like the very principal of an idea goes against this vision, more that, once you put something out into the commons, its irretrievable. Its not really yours any more once its been communicated. I think if you want to keep an idea truly yours, then you should keep it to yourself. Otherwise you are participating in a shared vision of the idea. You don't control how the idea is interpreted so its not really yours any more.

If thats ChatGPT or Public Enemy is neither here nor there to me. The idea that a work like Peter Pan is still possessed is such a very real but very silly obvious malady of this weirdly accepted but very extreme view of the ability to possess an idea.

[-] Laticauda@lemmy.ca 34 points 10 months ago* (last edited 10 months ago)

Ai isn't interpreting anything. This isn't the sci-fi style of ai that people think of, that's general ai. This is narrow AI, which is really just an advanced algorithm. It can't create new things with intent and design, it can only regurgitate a mix of pre-existing stuff based on narrow guidelines programmed into it to try and keep it coherent, with no actual thought or interpretation involved in the result. The issue isn't that it's derivative, the issue is that it can only ever be inherently derivative without any intentional interpretation or creativity, and nothing else.

Even collage art has to qualify as fair use to avoid copyright infringement if it's being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used (which requires intent). Even if it's transformative enough to make the original unrecognizable, if the majority of the work is not your own art, then you need to get permission to use it otherwise you aren't automatically safe from getting in trouble over copyright. Even using images for photoshop involves creative commons and commercial use licenses. Fanart and fanfic is also considered a grey area and the only reason more of a stink isn't kicked up over it regarding copyright is because it's generally beneficial to the original creators, and credit is naturally provided by the nature of fan works so long as someone doesn't try to claim the characters or IP as their own. So most creators turn a blind eye to the copyright aspect of the genre, but if any ever did want to kick up a stink, they could, and have in the past like with Anne Rice. And as a result most fanfiction sites do not allow writers to profit off of fanfics, or advertise fanfic commissions. And those are cases with actual humans being the ones to produce the works based on something that inspired them or that they are interpreting. So even human made derivative works have rules and laws applied to them as well. Ai isn't a creative force with thoughts and ideas and intent, it's just a pattern recognition and replication tool, and it doesn't benefit creators when it's used to replace them entirely, like Hollywood is attempting to do (among other corporate entities). Viewing AI at least as critically as actual human beings is the very least we can do, as well as establishing protection for human creators so that they can't be taken advantage of because of AI.

I'm not inherently against AI as a concept and as a tool for creators to use, but I am against AI works with no human input being used to replace creators entirely, and I am against using works to train it without the permission of the original creators. Even in the artist/writer/etc communities it's considered to be a common courtesy to credit other people/works that you based a work on or took inspiration from, even if what you made would be safe under copyright law regardless. Sure, humans get some leeway in this because we are imperfect meat creatures with imperfect memories and may not be aware of all our influences, but a coded algorithm doesn't have that excuse. If the current AIs in circulation can't function without being fed stolen works without credit or permission, then they're simply not ready for commercial use yet as far as I'm concerned. If it's never going to be possible, which I just simply don't believe, then it should never be used commercially period. And it should be used by creators to assist in their work, not used to replace them entirely. If it takes longer to develop, fine. If it takes more effort and manpower, fine. That's the price I'm willing to pay for it to be ethical. If it can't be done ethically, then imo it shouldn't be done at all.

[-] kogasa@programming.dev 3 points 10 months ago

Your broader point would be stronger if it weren't framed around what seems like a misunderstanding of modern AI. To be clear, you don't need to believe that AI is "just" a "coded algorithm" to believe it's wrong for humans to exploit other humans with it. But to say that modern AI is "just an advanced algorithm" is technically correct in exactly the same way that a blender is "just a deterministic shuffling algorithm." We understand that the blender chops up food by spinning a blade, and we understand that it turns solid food into liquid. The precise way in which it rearranges the matter of the food is both incomprehensible and irrelevant. In the same way, we understand the basic algorithms of model training and evaluation, and we understand the basic domain task that a model performs. The "rules" governing this behavior at a fine level are incomprehensible and irrelevant-- and certainly not dictated by humans. They are an emergent property of a simple algorithm applied to billions-to-trillions of numerical parameters, in which all the interesting behavior is encoded in some incomprehensible way.

[-] primbin@lemmy.one 1 points 10 months ago

I disagree with your interpretation of how an AI works, but I think the way that AI works is pretty much irrelevant to the discussion in the first place. I think your argument stands completely the same regardless. Even if AI worked much like a human mind and was very intelligent and creative, I would still say that usage of an idea by AI without the consent of the original artist is fundamentally exploitative.

You can easily train an AI (with next to no human labor) to launder an artist's works, by using the artist's own works as reference. There's no human input or hard work involved, which is a factor in what dictates whether a work is transformative. I'd argue that if you can put a work into a machine, type in a prompt, and get a new work out, then you still haven't really transformed it. No matter how creative or novel the work is, the reality is that no human really put any effort into it, and it was built off the backs of unpaid and uncredited artists.

You could probably make an argument for being able to sell works made by an AI trained only on the public domain, but it still should not be copyrightable IMO, cause it's not a human creation.

TL;DR - No matter how creative an AI is, its works should not be considered transformative in a copyright sense, as no human did the transformation.

[-] Immersive_Matthew@sh.itjust.works 0 points 10 months ago

I thought this way too, but after playing with ChatGPT and Mid Journey near daily, I have seen many moments of creativity way beyond the source it was trained on. I think a good example that I saw was on a YouTube video (sorry I cannot recall which to link) where thr prompt was animals made of sushi and wow, was it ever good and creative on how it made them and it was photo realistic. This is just not something you an find anywhere on the Internet. I just did a search and found some hand drawn Japanese style sushi with eyes and such, but nothing like what I saw in that video.

I have also experienced it suggested ways to handle coding on my VR Theme Park app that is very unconventional and not something anyone has posted about as near as I can tell. It seems to be able to put 2 and 2 together and get 8. Likely as it sees so much of everything at once that it can connect the dots on ways we would struggle too. It is more than regurgitated data and it surprises me near daily.

[-] Laticauda@lemmy.ca 1 points 10 months ago

Just because you think it seems creative due to your lack of experience with human creativity, that doesn't mean it is uniquely creative. It's not, it can't be by its very nature, it can only regurgitate an amalgamation of stuff fed into it. What you think you see is the equivalent of paradoilia.

[-] Immersive_Matthew@sh.itjust.works 1 points 10 months ago

Why you making personal jabs to make a point? How do you know my creative experience?

[-] Even_Adder@lemmy.dbzer0.com -2 points 10 months ago

if it’s being done for profit, and fair use requires it to provide commentary, criticism, or parody of the original work used. Even if it’s transformative enough to make the original unrecognizable

I'm going to need a source for that. Fair use is a flexible and context-specific, It depends on the situation and four things: why, what, how much, and how it affects the work. No one thing is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use.

[-] Laticauda@lemmy.ca 10 points 10 months ago

I'm a bit confused about what point you're trying to make. There is not a single paragraph or example in the link you provided that doesn't support what I've said, and none of the examples provided in that link are something that qualified as fair use despite not meeting any criteria. In fact one was the opposite, as something that met all the criteria but still didn't qualify as fair use.

The key aspect of how they define transformative is here:

Has the material you have taken from the original work been transformed by adding new expression or meaning?

These (narrow) AIs cannot add new expression or meaning, because they do not have intent. They are just replicating and rearranging learned patterns mindlessly.

Was value added to the original by creating new information, new aesthetics, new insights, and understandings?

These AIs can't provide new information because they can't create something new, they can only reconfigure previously provided info. They can't provide new aesthetics for the same reason, they can only recreate pre-existing aesthetics from the works fed to them, and they definitely can't provide new insights or understandings because again, there is no intent or interpretation going on, just regurgitation.

The fact that it's so strict that even stuff that meets all the criteria might still not qualify as fair use only supports what I said about how even derivative works made by humans are subject to a lot of laws and regulations, and if human works are under that much scrutiny then there's no reason why AI works shouldn't also be under at least as much scrutiny or more. The fact that so much of fair use defense is dependent on having intent, and providing new meaning, insights, and information, is just another reason why AI can't hide behind fair use or be given a pass automatically because "humans make derivative works too". Even derivative human works are subject to scrutiny, criticism, and regulation, and so should AI works.

[-] EchoesInMay@lemmy.ml -5 points 10 months ago

Neural networks are based on the same principles as the human brain, they are literally learning in the exact same way humans are. Copyrighting the training of neural nets is the essentially the same thing as copyrighting interpretation and learning by humans.

[-] Laticauda@lemmy.ca 3 points 10 months ago

These AIs are not neural networks based on the human brain. They're literally just algorithms designed to perform a single task.

[-] Bogasse@lemmy.world 14 points 10 months ago

Well, I'd consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is "they build original content", both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their "original content" is not derivated from copyrighted content 🤷

[-] TropicalDingdong@lemmy.world 1 points 10 months ago

Well, I’d consider agreeing if the LLMs were considered as a generic knowledge database. However I had the impression that the whole response from OpenAI & cie. to this copyright issue is “they build original content”, both for LLMs and stable diffusion models. Now that they started this line of defence I think that they are stuck with proving that their “original content” is not derivated from copyrighted content 🤷

Yeah I suppose that's on them.

[-] Toasteh@lemmy.world 7 points 10 months ago

Copyright definitely needs to be stripped back severely. Artists need time to use their own work, but after a certain time everything needs to enter the public space for the sake of creativity.

[-] AgentOrange@lemm.ee -1 points 10 months ago

To add to that, Harry Potter is the worst example to use here. There is no extra billion that JK Rowling needs to allow her to spend time writing more books.

Copyright was meant to encourage authors to invest in their work in the same way that patents do. If you were going to argue about the issue of lifting content from books, you should be using books that need the protection of copyright, not ones that don't.

[-] TropicalDingdong@lemmy.world 6 points 10 months ago

Copyright was meant

I just don't know that I agree that this line of reasoning is useful. Who cares what it was meant for? What is it now, currently and functionally, doing?

[-] treefrog@lemm.ee -3 points 10 months ago

If you sample someone else's music and turn around and try to sell it, without first asking permission from the original artist, that's copyright infringement.

So, if the same rules apply, as your post suggests, OpenAI is also infringing on copyright.

[-] NOT_RICK@lemmy.world 3 points 10 months ago* (last edited 10 months ago)

A sample is a fundamental part of a song’s output, not just its input. If LLMs are changing the input’s work to a high enough degree is it not protected as a transformative work?

[-] treefrog@lemm.ee -3 points 10 months ago* (last edited 10 months ago)

it's more like a collage of everyone's words. it doesn't make anything creative because ot doesn't have a body or life or real social inputs you could say. basically it's just rearranging other people's words.

A song that's nothing but samples. but so many samples it hides that fact. this is my view anyway.

and only a handful of people are getting rich of the outputs.

if we were in some kinda post capitalism economy or if we had UBI it wouldn't bother me really. it's not the artists ego I'm sticking up for, but their livelihood

[-] TropicalDingdong@lemmy.world -1 points 10 months ago

If you sample someone else’s music and turn around and try to sell it, without first asking permission from the original artist, that’s copyright infringement.

I think you completely and thoroughly do not understand what I'm saying or why I'm saying it. No where did I suggest that I do not understand modern copyright. I'm saying I'm questioning my belief in this extreme interpretation of copyright which is represented by exactly what you just parroted. That this interpretation is both functionally and materially unworkable, but also antithetical to a reasonable understanding of how ideas and communication work.

this post was submitted on 22 Aug 2023
762 points (95.7% liked)

Technology

55692 readers
2866 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS