this post was submitted on 21 Jul 2024
32 points (100.0% liked)

TechTakes

1384 readers
199 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 1 year ago
MODERATORS
 

Need to let loose a primal scream without collecting footnotes first? Have a sneer percolating in your system but not enough time/energy to make a whole post about it? Go forth and be mid: Welcome to the Stubsack, your first port of call for learning fresh Awful you’ll near-instantly regret.

Any awful.systems sub may be subsneered in this subthread, techtakes or no.

If your sneer seems higher quality than you thought, feel free to cut’n’paste it into its own post — there’s no quota for posting and the bar really isn’t that high.

The post Xitter web has spawned soo many “esoteric” right wing freaks, but there’s no appropriate sneer-space for them. I’m talking redscare-ish, reality challenged “culture critics” who write about everything but understand nothing. I’m talking about reply-guys who make the same 6 tweets about the same 3 subjects. They’re inescapable at this point, yet I don’t see them mocked (as much as they should be)

Like, there was one dude a while back who insisted that women couldn’t be surgeons because they didn’t believe in the moon or in stars? I think each and every one of these guys is uniquely fucked up and if I can’t escape them, I would love to sneer at them.

you are viewing a single comment's thread
view the rest of the comments
[–] BlueMonday1984@awful.systems 10 points 3 months ago* (last edited 3 months ago) (3 children)

Not a sneer, but a mildly interesting open letter:

A specification for those who want content searchable on search engines, but not used for machine learning.

The basic idea is effectively an extension of robots.txt which attempts to resolve the issue by providing a means to politely ask AI crawlers not to scrape your stuff.

Personally, I don't expect this to ever get off the ground or see much usage - this proposal is entirely reliant on trusting that AI bros/companies will respect people's wishes and avoid scraping shit without people's permission.

Between OpenAI publicly wiping their asses with robots.txt, Perplexity lying about user agents to steal people's work, and the fact a lot of people's work got stolen before anyone even had the opportunity to say "no", the trust necessary for this shit to see any public use is entirely gone, and likely has been for a while.

[–] o7___o7@awful.systems 9 points 3 months ago* (last edited 3 months ago)

OpenAI is creating the trustless economy that makes bitcoin necessary. taps forehead

[–] sailor_sega_saturn@awful.systems 5 points 3 months ago* (last edited 3 months ago) (1 children)

The best proposal I've seen so far ~~short of destroying all AI scrapers~~, and essentially what anyone familiar with the specs would come up with.

The only thing I'd add is an analogue to data-nosnippet to exclude only specific sections of the HTML document (w/o needing to reach for an entire iframe); though that's harder to implement on the crawler end so maybe that's for the best.

Google uses a second User-Agent directive; while Bing suggests using noarchive. Both of these are pretty hacky and not general, so it'd be good to see the industry standardize on the above proposal.

[–] BlueMonday1984@awful.systems 10 points 3 months ago

The proposal itself does still assume that AI scrapers are being run by decent human beings with functioning moral compasses, which is why I feel its inadequate.

This take might be overly harsh on AI/tech as a whole, but at this point I've run out of patience regarding this bubble and see no reason to believe anyone in the AI space is a decent human being, at least for the time being.

[–] sailor_sega_saturn@awful.systems 4 points 3 months ago (1 children)
[–] BlueMonday1984@awful.systems 4 points 3 months ago

At this point, I wouldn't fault anyone for blanket-blocking all scrapers/robots - sure, doing that will make you unfindable by search engines, but search is basically useless nowadays for finding anything actually interesting, and trying to play whack-a-mole with AI scrapers just means you're gonna get your shit stolen.

Might as well go back to word-of-mouth.