this post was submitted on 20 Sep 2024
12 points (92.9% liked)
Reddthat Community and Support
713 readers
1 users here now
Reddthat Community & Support
This community is for us to chat about anything and everything, including support topics!
There is no defining specific rules for this community and it can be anything from "I do not like the weather" to "I've won the jackpot and want to give all the money away to charity!" ๐
Before posting, have you read the rules?
Introductory Required Reading
For anything else, try a search and see what turns up, if not post away!
Alternative Matrix Chatroom:
Alternative Support Forums:
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
We enabled the CloudFlare AI bots and Crawlers mode around 0:00 UTC (20/Sept).
This was because we had a huge number of AI scrapers that were attempting to scan the whole lemmyverse.
It successfully blocked them... While also blocking federation ๐ด
I've disabled the block. Within the next hour we should see federation traffic come through.
Sorry for the unfortunate delay in new posts!
Tiff
It happens. Appreciate the effort! I noticed a marked uptick in the lemmit bot mirroring Reddit, so I wonder if it was a coincidence or a sibling effort.
Thank you
Might be to much work but you can allow a subset of traffic to bypass a CF WAF rule if the federated traffic is identifiable vs the scrapers.
Edit: I'm reading up. What I said above may not apply to the one click thing: https://blog.cloudflare.com/declaring-your-aindependence-block-ai-bots-scrapers-and-crawlers-with-a-single-click/
I do support turning it on after what I read at that link.
Edit 2: From here: https://developers.cloudflare.com/bots/get-started/free/#limitations
It's like they tried to make that confusing to read.
Possibly, as it's one generic endpoint, but it also blocked a few other things people in the fediverse created, which are mighty helpful in diagnosis of these and other issues.
So using some AI model or whatever CF uses is probably not going to be the best thing for us as it classified a POST request as a crawler?? ๐คท
I'd have to whitelist every regular endpoint as well and then it gets messy as CF only gives you so much control as a free user.
So, for the moment I've blocked the most annoying ones based on UserAgent.
That's why I started with "this might be to much work" ๐. Seems like there would be a way to do it without the automated bot blocking just using allow and deny (or challenge I guess it is here). The list would be a bitch to create by hand but shouldn't it exist already somewhere in the federation configs? If so you could broadly allow those while blocking or challenging otherwise. I guess it comes down to how do you identify bot traffic with free, without the tool on.
Full disclosure: I have CF Enterprise experience but I'm just guessing in the Lemmy/federation part and haven't messed with CF free.