Having a full backup availible over torrent or some other public source would just make it even easier for data brokers. Now they don't even have to do the scraping anymore.
Fediverse Futures
Social Media Reimagined
This is a companion to Fediverse Futures on Social Coding to elaborate the Fediverse from high-level, non-technical perspectives, brainstorming our visions and dreams.
We need a more holistic approach to fedi development and evolution. We need product designers, graphics artists, UX / UI / Interaction designers, futurists and visionaries to join the dev folks. Everyone is encouraged to join here and enrich our views on what Fediverse can be with diverse and different viewpoints, and to stimulate brainstorming, creativity, thinking out-of-the-box and crazy, wild ideas.
Some guidelines
- Choose a descriptive title that speaks for itself.
- Be substantive in your comments and stay on-topic.
- Treat others as you want to be treated, respectful.
- Don't be overly critical, we are just brainstorming.
Please read the Social Coding Community Participation Guidelines for more information.
Our fedi hashtags
#Peopleverse #FediverseFutures #Web0 #SocialNetworkingReimagined #UnitedInDiversity #Fedivolution2022 #SocialCoding #ActivityPub
It is possible to train your own LLM, you can be a data broker, I mean the problem is on the capitalism over data.
edit: i added "capitalism of" in the title
If you make something public, it can be accessed by ANYONE. It's what "public" is. If you want your public stuff not to be used by data brokers, just don't make it public
I think this is the fundamental flaw people always overlook. They want their data public and want to be able to restrict how it’s used.
You know what else does that? DRM. The thing a lot of people are massively opposed to. The goal behind it is to reach a wide audience but restrict how it can be used.
DRM is not the only option. If they want to restrict the usage, they can just write a custom license for their publications. And wait isn't the problem with DRM is that it uses unique device IDs?
And how well does that work in games? “You can’t cheat, please don’t, pinky promise?” It’s the same with LLMs. They see data, they parse it, licenses be damned. It’s as bad as those people trying to link to the license they released their text under or on Facebook with people posting “I don’t approve my text to be used… “.
Well if someone breaks the license, they can be lawsuited. But yea if you don't want your data to ever be used for anything, public is not an option. It's the same with irl speeches
I don't mean data brokers using my data, I mean they(hosts included) close that data and sells it for high. The public data is made and input'd by the public.
If you meant that a Lemmy instance can collect the data, well it's a matter of trust
It can close that data, or sell api for high like Reddit.
Of course it's possible, especially if it grows large enough