Technology

19 readers

1 users here now

This magazine is dedicated to discussions on the latest developments, trends, and innovations in the world of technology. Whether you are a tech enthusiast, a developer, or simply curious about the latest gadgets and software, this is the place for you. Here you can share your knowledge, ask questions, and engage in discussions on topics such as artificial intelligence, robotics, cloud computing, cybersecurity, and more. From the impact of technology on society to the ethical considerations of new technologies, this category covers a wide range of topics related to technology. Join the conversation and let's explore the ever-evolving world of technology together!

founded 2 years ago

Sarah Silverman is suing OpenAI and Meta for copyright infringement (www.theverge.com)

submitted 2 years ago by Madison_rogue@kbin.social to c/tech@kbin.social

42 comments fedilink hide all child comments

Comedian and author Sarah Silverman, as well as authors Christopher Golden and Richard Kadrey — are suing OpenAI and Meta each in a US District Court over dual claims of copyright infringement.

you are viewing a single comment's thread
view the rest of the comments

[–] sky@lemmy.codesink.io 29 points 2 years ago (21 children)

Interested to see how this plays out! Their argument that the only way a LLM could summarize their book is by ingesting the full copyrighted work seems a bit suspect, as it could've ingested plenty of reviews and summaries written by humans and combined that information.

I'm not confident that they'll be able to prove OpenAI or Meta infringed copyright, just as i'm not confident they'll be able to prove that they didn't violate copyright. I don't know if anyone really knows what these things are trained on.

We got to where we are now with fair use in search and online commentary because of a ton of lawsuits setting precedent, not surprising we'll have to do the same with machine learning.

[–] Madison_rogue@kbin.social 23 points 2 years ago* (last edited 2 years ago) (6 children)

ThePile, which was assembled by a company called EleutherAI. ThePile, the complaint points out, was described in an EleutherAI paper as being put together from “a copy of the contents of the Bibliotik private tracker.” Bibliotik and the other “shadow libraries” listed, says the lawsuit, are “flagrantly illegal.”

I think this is where the crux of the case lies since the article mentions these are only available illegally through torrents.

[–] Itty53@kbin.social 6 points 2 years ago* (last edited 2 years ago) (1 children)

This is starting to touch on the root of why they keep calling this "AI", "training", etc. They aren't doing this for strictly marketing, they are attempting to skew public opinion. These companies know intimately how to do that.

They're going to argue that if torrents are legal for educational purposes (ie the loophole that all trackers use), and they're just "training" an "AI" then they're just engaging in education. And an ignorant public might buy it.

These kinds of cases will be viewed as landmark cases in the future and honestly I don't have huge hopes. The history of these companies is engineer first, excuse the lack of ethics later. Or the philosophy of "it's easier to apologize than ask".

[–] dandan@kbin.social 30 points 2 years ago* (last edited 2 years ago) (2 children)

It's the defacto term for how we fit a statistical model to data, unrelated to any copyright concepts. I'm pretty sure we called it "training" back in 1997 when I was doing neural networks at uni, and it's probably been used well before then too.

Neural nets are based on the concept of Hebbian learning (from the 1930s), because they are trying to mimic how a biological neural network learns.

This concept of training/learning has persisted because it's a good analogy of what we are trying to do with these statistical models, even if they aren't strictly neural networks.

[–] Saganastic@kbin.social 7 points 2 years ago (1 children)

This concept of training/learning has persisted because it's a good analogy of what we are trying to do with these statistical models, even if they aren't strictly neural networks.

LLMs are indeed neural networks.

[–] dandan@kbin.social 2 points 2 years ago

Ahh ok. I didn't want to assume as I'm not familiar with the details.

[–] Madison_rogue@kbin.social 1 points 2 years ago (1 children)

TBH I'm not really familiar with how the AI has developed over the years. Wikipedia says that ChatGPT is proprietary, which leads me to believe it's hasn't been developed with research grants or government involvement. Is this the case? Can a company legally develop an AI by obtaining its learning material through illegal means? Which it sounds as if Open AI and Meta did through the use of Bibliotik.

I can't see how this doesn't have some legal ramification, but IANAL.

[–] Rabbithole@kbin.social 6 points 2 years ago* (last edited 2 years ago)

OpenAI is called that for a reason. They absolutely were a non-profit research org initially, so would have been eligible for research grants, etc. They would probably have gotten a pass on using the torrents too, for the same reason.

They went to a private for-profit model later after they built their AI's and wanted to start selling them as a service. How the hell all of that plays out as the company they are now is anyone's guess.

load more comments (4 replies)

load more comments (18 replies)