this post was submitted on 18 Sep 2024
335 points (92.8% liked)

Technology

58138 readers
4309 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] mac@lemm.ee 9 points 15 hours ago (2 children)

is it not relatively trivial to pre-vet content before they train it? at least with aigen text it should be.

[–] RvTV95XBeo@sh.itjust.works 20 points 14 hours ago (1 children)

The problem is these AI companies currently exist on the business model of not paying for information, and that generally includes not wanting to pay content curators.

Google is probably the only one in a position to potentially outsource by making everyone solve a "does this hand look normal to you" CAPTCHA

They can try and train AI to detect AI, but that's also difficult.

[–] FMT99@lemmy.world 1 points 11 hours ago (1 children)

So it's not a problem with AI. It's just a problem for some mayfly companies that try to profit from the latest trend?

[–] Honytawk@lemmy.zip 1 points 2 hours ago

As always.

The model isn't dying, its the way these parasites want it to work that is dying.

[–] General_Effort@lemmy.world 1 points 10 hours ago

It depends on what you are looking for. Identifying AI generated data is generally hard, though it can be done in specific cases. There is no mathematical difference between the 1s and 0s that encoded AI generated data and any other data. Which is why these model collapse ideas are just fantasy. There is nothing magical about any data that makes it "poisonous" to AI. The kernel of truth behind these ideas is not likely to matter in practice.