this post was submitted on 17 Jun 2025

154 points (98.7% liked)

Fuck AI

3126 readers

1402 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

themaninblack@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

ogmios@sh.itjust.works

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

154

ChatGPT Has Already Polluted the Internet So Badly That It's Hobbling Future AI Development (share.google)

submitted 1 day ago by Bender12@lemmy.world to c/fuck_ai@lemmy.world

16 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] ElectroVagrant@lemmy.world 28 points 1 day ago* (last edited 19 hours ago) (3 children)

Odd url...Here's the original: https://futurism.com/chatgpt-polluted-ruined-ai-development

Nice detail to use when searching the internet btw:

"But if you're collecting data before 2022 you're fairly confident that it has minimal, if any, contamination from generative AI," he added. "Everything before the date is 'safe, fine, clean,' everything after that is 'dirty.'"

Try running searches set pre-2022, at least for older info, to reduce the possibilities of AI generated noise.

Anyway, kinda funny to see these generators may be producing enough noise to make producing more noise somewhat harder. Hopefully this doesn't also impact more productive AI development, such as what's used in scientific research and the like, as that would genuinely suck.

Edit:
Revised from generators "have produced" to "may be producing" to better reflect the lack of concrete info regarding generative AI data pollution as someone else pointed out. As they note:

"Now, it's not clear to what extent model collapse will be a problem, but if it is a problem, and we've contaminated this data environment, cleaning is going to be prohibitively expensive, probably impossible," he told The Register.

[–] JeremyHuntQW12@lemmy.world 2 points 1 day ago

There's nothing in the article, the Register article or any references that claim there is actual pollution of data.

It's based on speculation made years ago.

[–] chickenf622@sh.itjust.works 1 points 1 day ago

Plus side of actual useful application of LLM/AI is the data is usually a small subset of data, and it would have to be tested anyways since it would have to be used in the real world. I think the main use of LLM/AI in mainstream is using it on small datasets like that instead of the race for the holy grail of "General" AI.

[–] blazeknave@lemmy.world 1 points 1 day ago

Fuck. Will this next epoch retrospectively be considered a dark age, not bc disinformation, but bc after 2022 we were giberishing morons?