324
submitted 2 months ago by Blaze@lemmy.blahaj.zone to c/reddit@lemmy.world

cross-posted from: https://lemmy.ca/post/19946388

An anticapitalist tech blog. Embrace the technology that liberates us. Smash that which does not.

you are viewing a single comment's thread
view the rest of the comments
[-] hperrin@lemmy.world 18 points 2 months ago

That’s probably not going to be useful. Reddit keeps your original comment text.

[-] tehciolo@lemm.ee 15 points 2 months ago

I think you missed the part where you were strongly suggested "not" to use copyrighted text.

The point is not to get rid of the original text. It's to "poison" the training data.

[-] Everythingispenguins@lemmy.world 1 points 2 months ago

Are porn scrips copyrighted?

[-] FaceDeer@fedia.io -2 points 2 months ago

If the AI trainers have the original text then "poisoning" the live site's content isn't going to do anything at all.

You can't touch the original text. It's already been archived.

[-] tehciolo@lemm.ee 7 points 2 months ago

If they scrape the updated comments again and ingest copyrighted text, you are poisoning the data.

[-] FaceDeer@fedia.io 2 points 2 months ago

That's my point. They won't.

And even if they did, it's unclear that copyright has anything to say about AI training anyway.

[-] InternetPerson@lemmings.world 6 points 2 months ago

NYT is currently suing because of copyright infringiments.

https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html

it’s unclear that copyright has anything to say about AI training anyway

Although lawmakers worldwide have slept while AI advanced and therefore missed to make some important laws, they are catching up. Europe recently passed its first AI act. As far as I've seen it also states that companies must disclose a detailed summary of their training data.

https://www.ml6.eu/blogpost/ai-models-compliance-eu-ai-act

[-] FaceDeer@fedia.io 1 points 2 months ago

You can sue about anything you want in the United States, it remains to be seen whether the courts will side with them. I think it's unlikely they'll get much of a win out of it.

A law that requires disclosing a summary of training data isn't going to stop anyone from using that training data.

[-] Th4tGuyII@kbin.social 7 points 2 months ago

Yeah - this is what I was thinking. We all heard about people being unable to delete comments or Reddit keeping comments even after account deletions back during the first migration, so what stops them holding onto comment history - and what stops them using that to teach llms to discern poisoned data from real data as @pixxelkick said.

[-] pixxelkick@lemmy.world 4 points 2 months ago

Yeah in fact you're giving the llm additional data to train on what poisoned data looks like so it can avoid it better, as they can clear see the before vs after

[-] InternetPerson@lemmings.world 3 points 2 months ago

It is necessary to employ a method which enables the training procedure to distinguish copyrighted material. In the "dumbest" case, some humans will have to label it.

Just because you've edited a comment, doesn't mean that this can be seen as "oh, this is under copyright now".

I don't say it's technical impossible. To the contrary, it very much is possible. It's just more work. This drives the development costs up and can give some form of satisfaction to angered ex-reddit users like me. However, those costs will be peanuts for giants like Google / Alphabet.

this post was submitted on 24 Apr 2024
324 points (97.6% liked)

Reddit

16744 readers
19 users here now

News and Discussions about Reddit

Welcome to !reddit. This is a community for all news and discussions about Reddit.

The rules for posting and commenting, besides the rules defined here for lemmy.world, are as follows:

Rules


Rule 1- No brigading.

**You may not encourage brigading any communities or subreddits in any way. **

YSKs are about self-improvement on how to do things.



Rule 2- No illegal or NSFW or gore content.

**No illegal or NSFW or gore content. **



Rule 3- Do not seek mental, medical and professional help here.

Do not seek mental, medical and professional help here. Breaking this rule will not get you or your post removed, but it will put you at risk, and possibly in danger.



Rule 4- No self promotion or upvote-farming of any kind.

That's it.



Rule 5- No baiting or sealioning or promoting an agenda.

Posts and comments which, instead of being of an innocuous nature, are specifically intended (based on reports and in the opinion of our crack moderation team) to bait users into ideological wars on charged political topics will be removed and the authors warned - or banned - depending on severity.



Rule 6- Regarding META posts.

Provided it is about the community itself, you may post non-Reddit posts using the [META] tag on your post title.



Rule 7- You can't harass or disturb other members.

If you vocally harass or discriminate against any individual member, you will be removed.

Likewise, if you are a member, sympathiser or a resemblant of a movement that is known to largely hate, mock, discriminate against, and/or want to take lives of a group of people, and you were provably vocal about your hate, then you will be banned on sight.



Rule 8- All comments should try to stay relevant to their parent content.



Rule 9- Reposts from other platforms are not allowed.

Let everyone have their own content.



:::spoiler Rule 10- Majority of bots aren't allowed to participate here.

founded 1 year ago
MODERATORS