this post was submitted on 04 Jul 2023
552 points (99.5% liked)
Lemmy.World Announcements
29148 readers
2 users here now
This Community is intended for posts about the Lemmy.world server by the admins.
Follow us for server news 🐘
Outages 🔥
https://status.lemmy.world/
For support with issues at Lemmy.world, go to the Lemmy.world Support community.
Support e-mail
Any support requests are best sent to info@lemmy.world e-mail.
Report contact
- DM https://lemmy.world/u/lwreport
- Email report@lemmy.world (PGP Supported)
Donations 💗
If you would like to make a donation to support the cost of running this platform, please do so at the following donation URLs.
If you can, please use / switch to Ko-Fi, it has the lowest fees for us
Join the team
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I’m calling it - if there’s actually a memory leak in the Rust code, it’s gonna be the in memory queues because the DB’s iops can’t cope with the number of users.
I think I found what eats the memory. DB iops isn't the cause - looks like the server doesn't reply before all the database operations are done. The problem is the unbounded queue in the activitypub_federation crate, spawned when creating the ActivityQueue struct. The point is, this queue holds all the "activities" - events to be sent to federated instances. If, for whatever reason, the events aren't delivered to all the federated servers, they are retried with an exponential backoff for up to 2.5 days. If even a single federated instance is unreachable, all events remain in memory. For a large instance, this will eat up the memory for every upvote/downvote, post or comment.
Lemmy needs to figure out a scalable eventual consistency algorithm. Most importantly, to store the messages in the DB, not in memory.
You should consider bringing this up at !lemmyperformance@lemmy.ml
You should take this entire comment and paste it in as a issue on the Lemmy Github Issues page.
I think the devs have been aware of the issue, theoretically, for a while. A proper solution requires some significant changes, so it was being postponed because this wasn't considered urgent.