this post was submitted on 07 Mar 2024
574 points (97.7% liked)
Technology
59605 readers
3156 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Reddit is almost certainly going to throw your old comments to them if you edit stuff. We're pretty fucked. And if you think Lemmy is any different, guess again. We agreed to send our comments to everyone else in the fediverse, plenty of bad actors and a legal minefield allows LLMs to do what they want essentially. The good news is that LLMs are all crap, and people are slowly realising this
Lemmy is different, in that the data is not being sold to anyone. Instead, the data is available to anyone.
It's kind of like open source software. Nobody can buy it, cause it's open and free to be used by anyone. Nobody profits off of it more than anyone else - nobody has an advantage over anyone else.
Open source levels the playing field by making useful code available to everyone. You can think of comments and posts on the Fediverse in the same way - nobody can buy that data, because it's open and free to be used by anyone. Nobody profits off of it more than anyone else and nobody has an advantage over anyone else (after all, everyone has access to the same data).
The only problem is if you're okay with your data being out there and available in this way... but if you're not, you probably shouldn't be on the internet at all.
If the post is creative then it's automatically copyrighted in many countries. That doesn't stop people collecting it and using it to train ML (yet).
Copyright has little to say in regards to training models - it's the published output that matters.
LLM's have already changed the tech space more than anything else for the last 10 years at least. I get what you're trying to say but that opinion will age like milk.
Edit: made wording clearer
I've been harping on about this for a while on the fediverse ... private/closed/non-open spaces really ought to be thought about more. Fortunately, lemmy core devs are implementing local only and private communities (local only is already done IIRC).
Yes they introduce their own problems with discovery and gating etc. But now that the internet's "you're the product" stakes have gone beyond what could have been construed as a reasonably transaction, "my attention on an ad ... for a service", to "my mind's products to be aggregated into an energy sucking job replacing AI ... for a service" ... well it's time to normalise closing that door on opportunistic tech capitalists.
LLMs are great for anything you’d trust to an 8 year old savant.
It’s great for getting quick snippets of code using languages and methods that have great documentation. I don’t think I’d trust it for real work though
They'll use old comments either way, using an up-to-date dataset means using a dataset already tainted by LLM-generated content. Training a model on its own output is not great.
Incidentally this also makes Lemmy data less valuable, most of Lemmy's popularity came after the rise of LLMs so there's no significant untainted data from before LLMs.