this post was submitted on 13 Jun 2023
22 points (100.0% liked)
Asklemmy
44151 readers
1453 users here now
A loosely moderated place to ask open-ended questions
Search asklemmy ๐
If your post meets the following criteria, it's welcome here!
- Open-ended question
- Not offensive: at this point, we do not have the bandwidth to moderate overtly political discussions. Assume best intent and be excellent to each other.
- Not regarding using or support for Lemmy: context, see the list of support communities and tools for finding communities below
- Not ad nauseam inducing: please make sure it is a question that would be new to most members
- An actual topic of discussion
Looking for support?
Looking for a community?
- Lemmyverse: community search
- sub.rehab: maps old subreddits to fediverse options, marks official as such
- !lemmy411@lemmy.ca: a community for finding communities
~Icon~ ~by~ ~@Double_A@discuss.tchncs.de~
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
There's a lot of things that factor into the answer, but I think overall it's gonna be pretty random. Some instances are on domains without "Lemmy" in the name, some don't include "Lemmy" in the site name configuration, and in the case of some like my own instance, I set the
X-Robots-Tag
response header such that search engines that properly honor the header won't crawl or index content on my instance. I've actually taken things a step further with mine and put all public paths except for the API endpoints behind authentication (so that Lemmy clients and federation still work with it), so you can't browse my instance content without going through a proper client for extra privacy. But that goes off-topic.Reddit was centralized so could be optimized for SEO. Lemmy instances are individually run with different configuration at the infrastructure level and the application configuration level, which if most people leave things fairly vanilla, should result in pretty good discovery of Lemmy content across most of these kinds of instances, but I would think most people technical enough to host their own instances would have deviated from defaults and (hopefully) implemented some hardening, which would likely mess with SEO.
So yeah, expect it to be pretty random, but not necessarily unworkable.
Easily the best answer here, I think the people who think it will work "just like Reddit" are unfamiliar with federation still, and aren't used to thinking things through in those terms.
Not to mention that Google results in general have been pretty trash for a couple years now. I don't expect fediverse content to be prominent for some time unless there is a dedicated service that indexes everything.
I kind of feel like Kagi will be all over this with it's forum 'lens' for search, but it's paid. Maybe boardreader would focus on this too?
Google search isn't as good as it used to be and using startpage.com to break the filter bubble isn't effective as much anymore either. So we probably all also need to start remembering like 1999 and different search engine for different things and looking for what works the best.
I mean why couldn't there be a dedicated service that indexes everything? Whoever makes it and gets it working in a user friendly manner is going to have a significant level of control on the content that is shown in the results. If you don't want it, it isn't indexed. I don't have to stretch the imagination to think of parties that have good reason to want to be first to do that across Activity Pub as a whole. Mastodon is already a big frontrunner in that regard.
Your "off-topic" sounded pretty cool to me! I love that that is something anyone can do when hosting a lemmy instance. You get to choose if it's searchable on the web! Obviously there are search engines which ignore the no scraping/indexing header, but the rest of what you did should counteract that, noice.
Yeah, if you're running something yourself, you can do pretty much whatever you want in order to protect it. Especially if it's behind a reverse proxy. Firewalls are great for protecting ports, but reverse proxies can be their own form of protection, and I don't think a lot of people associate them with "protection" so much. Why expose paths (unauthenticated) that don't need to be? For instance, in my case with my Lemmy instance, all any other instance needs is access to the
/api
path which I leave open. And all the other paths are behind basic authentication which I can access, so I can still use the Lemmy web interface on my own instance if I want to. But if I don't want others browsing to my instance to see what communities have been added, or I don't want to give someone an easy glance into what comments or posts my profile has made across all instances (for a little more privacy), then I can simply hide that behind the curtain without losing any functionality.It's easy to think of these things when you have relevant experience with web development, debugging web applications, full stack development, and subject matter knowledge in related areas, if you have a tendency to approach things with a security-oriented mindset. I'm not trying to sound arrogant, but honestly my professional experience has a lot to do with how my personal habits have formed around my hobbies. So I have a tendency to take things as far as I can with everything that I know, and stuff like this is the result lol. Might be totally unnecessary without much actual value, but it errs on the side of "a little more secure", and why not, if it's fun?
I'd be interested in how you did this, this seems like one of the best ways I've seen for securing a lemmy instance.
I have a single Nginx container that handles reverse proxying of all my selfhosted services, and I break every service out into its own configuration file, and use
include
directives to share common configuration across them. For anyone out there with Nginx experience, my Lemmy configuration file should make it fairly clear in terms of how I handle what I described above:It's definitely in need of some clean-up (for instance, there's no need for multiple location blocks that do the same thing for caching, a single expression can handle all of the ones with identical configuration to reduce the number of lines required), but I've been a bit lazy to clean things up. However it should serve as a good example and communicate the general idea of what I'm doing.