this post was submitted on 17 Feb 2024
259 points (100.0% liked)

Technology

37747 readers
221 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] SorteKanin@feddit.dk 130 points 9 months ago (1 children)

Remember the whole "if you aren't paying for the product, you are the product"?

It wasn't enough to turn you into a product. Now they also want to turn you into a resource. Farming your comments and posts to feed to an AI model.

What an economy we've built.

[–] tux0r@feddit.de 24 points 9 months ago (3 children)

I wonder why I don't pay for Lemmy.

[–] SorteKanin@feddit.dk 58 points 9 months ago (15 children)

The kind of frightening thing is that anyone could start an instance on the Fediverse, collect all the posts and comments coming in as all instances usually do and then use it to do the same thing, and I'm not sure there's currently anything (legally or otherwise) stopping them.

But at least we have the option to defederate such an instance. If we can find out which ones do it...

[–] GenderNeutralBro@lemmy.sdf.org 89 points 9 months ago (3 children)

I totally understand your perspective, but I approach this from the opposite direction.

From my perspective, there's no "at least" here. My Lemmy posts are public. I have no control over what is done with them after I post them. I am comfortable with that.

The difference between Reddit and Lemmy is not that one protects privacy and they other doesn't. NEITHER is a platform for private discussion.

The difference is that with Lemmy, public means PUBLIC. Reddit, Twitter, and Facebook are also "public" in the sense that there can be no expectation of privacy. But they're "private" in the corporate sense — a single corporate entity retains control of the data. They can, at will, restrict access to that data, without the consent of the users who created it.

And that's not just theoretical; all of those companies have literally restricted access to content that users meant to be public. People can't read the Twitter posts that I made with the intention of them being public, because Twitter now requires an account to read posts and comments. Reddit has restricted access to posts I made with the intention of them being public and readily accessible, because they killed apps and integrations, and implemented onerous access control in an attempt to hoard my data.

They altered the terms, and I, for one, got sick of praying that they would not alter them further.

Lemmy is public. You cannot control who can read it, and you cannot control what they do with it. The difference is that with a truly public platform like Lemmy, my data can benefit the whole world, instead of just some corporation.

If you are looking for a platform for private discussion, Matrix is probably it. But even then, the concept of data privacy only makes sense if you trust all the people that ever have access to the data. If I'm in a Matrix room with hundreds of strangers, I wouldn't consider that "private" either, regardless of the protocol's encryption.

Bad actors will always have access to the posts I make public. On Lemmy, good actors do, too, and nobody can take that away from us. THAT'S the difference.

[–] scrubbles@poptalk.scrubbles.tech 34 points 9 months ago

This is the right way to think of it. Reddit feels dirty because they were a private company and we trusted them in the walled garden. That trust was naiive at least on my part, but it was 14ish years ago I had joined and they never did wrong, until recently.

Lemmy, however, is a public protocol. From the ground up everything is public. There is no illusion of privacy here, and anyone who thinks there is should forget about it. The protocol is by definition public, and will launch any comment/post across the globe to anyone listening. It's nailing the paper to the door for everyone to see. To me this is okay though, because I know that going in. The tradeoff is less privacy, but it's an open platform that no one can take away.

[–] SorteKanin@feddit.dk 11 points 9 months ago

I really like that perspective, thank you for easing my fear.

load more comments (1 replies)
[–] Kichae@lemmy.ca 11 points 9 months ago

An instance isn't required. It's not like the current generation of generative AI wasn't trained from web scrapings

[–] BlameThePeacock@lemmy.ca 7 points 9 months ago

The instance would likely just act as a regular instance and allow normal users on, you couldn't even tell they were using it to scrape data at that point.

[–] skillissuer@discuss.tchncs.de 7 points 9 months ago

There are already a few instances that ignore delete requests

[–] BitOneZero@beehaw.org 6 points 9 months ago (3 children)

Free and open information, like Wikipedia, used to be an ideal. I have used Reddit since 2008 or earlier because it got on search engines and shared information consistently on precise topics. Twitter used to also be this way, but now mostly only puts paid subscribers on search engines.

If you are to organize information around topics, such as a Commodore 64 community, and the protocol openly allows copies to be made via federation, I encourage people to have the attitude that information be treated like Wikipedia content. It sucks now that so much information from 10 years ago has been just entirely lost now that so many deliberately purged their Reddit comments, etc. Tragedy of the commons. And it drags down the entire planet that people squirrel away discussions on topics that are generally public. It's like now everyone wants to monetize even their discussions on Commodore 64 or automotive repair / have behind absolute control or paywalls /etc.

load more comments (3 replies)
[–] RobotToaster@mander.xyz 5 points 9 months ago

People can already do that without an instance, the same way google indexes the site.

[–] Sibbo@sopuli.xyz 5 points 9 months ago (5 children)

Legally, in EU, you probably cannot scrape an instance of someone else because of the database copyright law. But I have no idea if that applies to being part of the network. Since the other instances send you their content willingly.

Maybe someone should make a license extension to ActivityPub, where instances can communicate what can and what can't be done with the information they publish. Then at least there would be legal clarity. If it can be enforced is another question.

load more comments (5 replies)
load more comments (8 replies)
[–] Creesch@beehaw.org 15 points 9 months ago* (last edited 9 months ago)

At least for the instance this was posted on: the February 2024 Beehaw Financial Update

[–] scrubbles@poptalk.scrubbles.tech 12 points 9 months ago

You don't have to, but the owners of your instance are probably paying out of pocket to keep it online. I'm sure they're taking donations

[–] Bishma@discuss.tchncs.de 32 points 9 months ago

That's why I'm on Lemmy. At least when they train AI on my posts here it's not legitimized by some contract.

[–] FlashMobOfOne@beehaw.org 29 points 9 months ago (1 children)

That AI is going to get really racist, really fast, judging by the muck we all saw daily on Reddit.

[–] echodot@feddit.uk 12 points 9 months ago (1 children)

Although it's going to be really good at anime porn too. So there's that.

load more comments (1 replies)
[–] Lojcs@lemm.ee 27 points 9 months ago (2 children)
[–] dmrzl@programming.dev 11 points 9 months ago

Like seriously, this must be fake. Add a zero and I'd still find it suspiciously cheap.

load more comments (1 replies)
[–] Evil_Shrubbery@lemm.ee 21 points 9 months ago* (last edited 9 months ago) (1 children)

Just in time to make new AI generated shitposts with AI generated replies & pump up those numbers for the IPO.

Can't wait to read a post about how a novice AI finds it hard to animate human hands and some other AI suggest studying hentai porn to get the finger/tentacles movements just right. And ofc lots of ads. From AIs, to AIs, by AIs, for AIs.

[–] Lemmy_2019@lemmy.one 8 points 9 months ago (1 children)

r/TotallyNotRobots is spreading everywhere.

load more comments (1 replies)
[–] DeltaTangoLima@reddrefuge.com 21 points 9 months ago (11 children)

And that's why I deleted all my posts and comments before deleting my account. Sure, they could probably go back and restore it if they wanted but, so far, they haven't.

Glad I landed here on Lemmy.

[–] Phen@lemmy.eco.br 10 points 9 months ago (3 children)

I deleted all my comments last year. Recently I got a notification for a response in one of such comments. When I clicked the notification link, my comment and the response were visible. The comment doesn't show up in my profile.

[–] thatsnothowyoudoit@lemmy.ca 6 points 9 months ago* (last edited 9 months ago) (3 children)

Reddit was aggressively rate limiting tools used to delete and edit content in a funny way when the API pricing was announced. The API wouldn’t return an error, the rate limiting was silent, and the tools would report successful deletion or edits even when the edit or deletion wasn’t made.

I had to modify an existing script to handle the 5-second rate limit and, lieu of deleting, I just rewrote each comment with a farewell.

Even then I did 3 passes (minor additional edits) in cases Reddit was saving previous edits.

My content has stayed edited.

load more comments (3 replies)
[–] DeltaTangoLima@reddrefuge.com 6 points 9 months ago* (last edited 9 months ago)

Interesting. I've specifically searched for some fairly unique content (Python scripts, etc) I posted in my time over there, and it hasn't shown up at all.

So you left your Reddit account intact?

Edit: Fucking. Cunts. I just searched (had been a few months) and at least some of my data is back. I reckon they've done it ahead of the planned AI move and IPO.

Edit 2: joke's on them - my posts were linked to an alt account I setup on Pastebin years ago. Still had the creds, so have deleted the pastes. Fuck Reddit. 🤘

load more comments (1 replies)
load more comments (9 replies)
[–] bilboswaggings@sopuli.xyz 20 points 9 months ago (1 children)
[–] Hubi@feddit.de 15 points 9 months ago* (last edited 9 months ago) (1 children)

And the outputs of bots. There has been a shocking increase in auto-generated comments on reddit in the past years and it's turning the training data into a minefield.

[–] nul@programming.dev 5 points 9 months ago

Haven't touched reddit socially in 8 months, but every now and then I'll use it to search for opinions or instructions on things. Searched "reddit best domain registrar" recently and landed on a thread where top to bottom, every comment recommending a registrar was from a bot and/or banned account. No real person testimonials, all ads. And as AI implementations improve, that's going to get harder to spot. In the meantime, I'm formatting searches like "best domain registrar lemmy" because reddit is legit that bad rn.

[–] sabreW4K3@lemmy.tf 14 points 9 months ago

We all knew it was coming, but it's still disappointing

[–] DragonTypeWyvern@literature.cafe 13 points 9 months ago (7 children)

Funny, I don't see anyone saying the AI companies have free right to Reddit's content.

load more comments (7 replies)
[–] fine_sandy_bottom@discuss.tchncs.de 11 points 9 months ago (2 children)

$60m doesn't seem like that much in an era where twitter could (have been) sold for $40b.

[–] mob@sopuli.xyz 4 points 9 months ago (5 children)

60 million a year for access to the relatively public data... That seems pretty good to me tbh.

load more comments (5 replies)
load more comments (1 replies)
[–] kib48@lemm.ee 10 points 9 months ago (1 children)

so the API thing was over nothing? brilliant

[–] Natanael@slrpnk.net 9 points 9 months ago

No, it was just preemptive to enforce control over who can programmatically read the site

[–] Overlock@sopuli.xyz 8 points 9 months ago (1 children)

Add the bot problem to it and you'll get garbage in, garbage out

[–] echodot@feddit.uk 5 points 9 months ago

Hell even the users didn't exactly contribute good quality content.

[–] RobotToaster@mander.xyz 8 points 9 months ago

We did it reddit, we trained an AI to be the pure embodiment of cringe.

[–] comicallycluttered@beehaw.org 7 points 9 months ago* (last edited 9 months ago)

Lol, so they're going to be training their AI on... AI generated content? The uptick in that shit on reddit has made it more annoying than usual.

That and all the confidently incorrect shit on the site... Not to mention the constant in-jokes. I'm just imagining a chatbot responding to something about how to deal with grief with "I also choose this man's dead wife!"

Can't see how this could possibly go wrong.

[–] neocamel@lemmy.studio 6 points 9 months ago* (last edited 9 months ago) (3 children)

Sounds like it's time for me to actually log back in and delete all my old posts. I've been putting that off for too long.

load more comments (3 replies)
[–] Yerbouti@lemmy.ml 6 points 9 months ago (1 children)

Got to get my data deleted quick.

load more comments (1 replies)
[–] unknowing8343@discuss.tchncs.de 6 points 9 months ago (1 children)

They are gonna love it when their chatbot also chooses that man's dead wife.

[–] Kolanaki@yiffit.net 6 points 9 months ago* (last edited 9 months ago)

There's gonna be so many bots commenting "Actually...." Followed by the most incorrect information about the topic at hand possible.

load more comments
view more: next ›