endlesstalk.org

118 readers
2 users here now

This Community is intended for posts about the endlesstalk.org server.

founded 1 year ago
MODERATORS
1
 
 

So I had to migrate to a new host again. The response times are a bit slow and spikes sometimes. I'm looking into it.

EDIT: Should be a bit better now, but there are still some spikes in response times, so I will keep looking into it.

2
 
 

This should however be the last time for a long time, since I have greatly improved the setup.

Pictrs database got corrupted during the process, which is why the images for the last 6 days are lost.

Let me know, if there are any issues after moving to the new host.

3
 
 

The database was corrupted, so had to recover from backup. About 6 hours of data was lost.

~~There does seem to be a weird network error sometimes. I'm looking into it.~~ FIXED

Lastly I apologize for all the downtime lately.

4
6
submitted 1 week ago* (last edited 1 week ago) by lemmy@endlesstalk.org to c/endlesstalkorg@endlesstalk.org
 
 

Endlesstalk has been migrated to a new host.

There did seem to have been some caching issues, causing some certificate issues, but it seems to be fixed(Atleast for me).

Let me know, if there are any issues after the migration

EDIT: The 525 ssl certificate error shows up intermittently. I'm looking into it.

EDIT2: Fixed. Was an issue with the nginx loadbalancer.

5
 
 

Earlier today one of the servers, where endlesstalk is hosted went down. After some time, the server came back up again, but there were some unknown issue and the server was unstable. So preparation to migrating endlesstalk to a new host began. However after setting the new servers up, there was success with getting one of the "old" servers up and running again.

Tommorow at ~~18:00~~ 20:00 UTC the migration to the new host will begin. See local time here. There will be some downtime with this, probably around an hour or less.

EDIT: Server went down again, but should be back again now.

EDIT2: 20:00 UTC, since I forgot I have something from 17-19 UTC.

6
 
 

The upgrade went smoothly and everything seems to work.

Let me know, if there is anything that dosn't work after the upgrade.

7
 
 

I have found the issue with the database migration, so the upgrade to the latest version of lemmy can proceed.

0.19.5 brings a lot of smaller bugfixes. See release notes here for more information. I will also upgrade the database to a newer version(postgres 16).

For this upgrade there will be downtime and I expect it to last around 1 hour or less. If there are any major issues with the upgrade, you can check the uptime site here or site status here

Local time can be seen here

8
 
 

The database migration to 0.19.4 failed, because the database schema doesn't align with the state the migrations want. The reason is probably because it didn't restore correctly from a previous backup, but I don't actually know the cause.

I thought I could create a new database with a correct schema and then import the data from the currrent database into the new one. This might still be possible, but it simply takes too long and it has gotten too late for me(03:00 in the night).

I will look into a fix for the migration and when I have a fix I will announce a new date for the upgrade to 0.19.4.

9
 
 

0.19.4 brings a lot of changes. See release notes here for more information.

There should be no downtime or very minimal downtime. If there are any issues, check the uptime site here or site status here

Local time can be seen here

Note: An update to postgres 16 and pictrs 0.5 is also comming soon, which will bring some downtime. Don't know when yet, but will post an update, when I know.

EDIT: There was an issue with migrating the database, while upgrading to 0.19.4, so will take longer.

EDIT2: The database is in a different state, than the migration to 0.19.4 expects. The cause is not clear, but I'm looking into it.

10
4
submitted 3 months ago* (last edited 3 months ago) by lemmy@endlesstalk.org to c/endlesstalkorg@endlesstalk.org
 
 

Hello

I have noticed that the server have been going down a lot for 10 - 20 mins.

Unfortunately, I'm currently on vacation, so I don't think I will have the time to fix it.

I will be back tomorrow evening and will look into it and hopefully fix it then.

EDIT: There was a misconfiguration of the auto scaling setup. This scaled the system up and used all of its CPU, which caused the site to be unresponsive.

This should be fixed now, but I will keep monitoring it.

11
 
 

While working on a small fix to lemmy, that was causing some unneeded cpu usage, I made a change that unfortunately caused the db and pictrs service storage to be deleted.

Thankfully I have backups of everything, so I went to restoring from a backup. However the restoring was very slow, since I used an unoptimal way to backup the db(raw sql dump). After the first backup completed, I found out that it was missing data, so I tried an older backup, but that didn't work either. It was missing data as well. So I tried a backup from another server(Since I backup to 2 different servers), which finally worked.

Usually restoring from backups, haven't taken too long previously, since my backups are faily small, but I will need to look into a quicker way to restore backups for lemmy, since the backup size of lemmy is much bigger.

NOTE: Data from ca. 2 hours before the site went down(16-18 UTC) will be missing and I'm unable to restore it.

12
 
 

The s3 host that pictrs use, have gone down.

I might have to move to another s3 host, which will mean it will take a bit before images are working. This will cause a loss of images for the last 2-3 hours before pictrs stopped working.

EDIT: Have moved to new s3 host. Unsure how many images were lost during the outage.

13
 
 

Local time can be seen here

0.19.3 is mostly bugfixes. See release notes here for more information.

There should be no downtime or very minimal downtime. If there are any issues, check the uptime site here or site status here

14
 
 

Local time can be seen here

0.19.2 contains fixes for outgoing federation and a few other things. See release notes here for more information.

There should be no downtime or very minimal downtime. If there are any issues, check the uptime site here or site status here

EDIT: The server went down / was very very slow to respond. I'm not quite sure why.

15
 
 

The 0.19 version is out and I expect to update sometime during the next week. Since it is a big release, I will need to spend some more time testing, that everything still works fine and ensure that the migration works without problems as well.

I will update with another post, when I know, when the update will take place.

16
 
 

The update fixes an issue with moderation actions sometimes not federating correctly(See release notes here).

There should be no or very minimal downtime. To convert UTC to your local time, use this

17
 
 

I expect a very minimal downtime of ca. 5-15 mins.

18
 
 

I had made some config changes to the database earlier in connection with the move to a new server, that caused the storage usage of the database to grow a lot. Then when it had no more space left, it crashed which caused the downtime. Unfortunately it happened at a time, where I wasn't available to fix immediately, which is why it was down for so long.

It is now fixed and I will keep a watch(setup an alert for database disk usage) to make sure it doesn't happen again.

19
1
submitted 9 months ago* (last edited 9 months ago) by lemmy@endlesstalk.org to c/endlesstalkorg@endlesstalk.org
 
 

Seems to be caused by low amount of available storage. This caused k8s to evict/delete pods -> site went down,, then it would fix itself -> site goes up again but then k8s would evict again -> site goes down. This continued untill it stabilized at some point.

This should be fixed and there should be more space available, when I move the server to a new host. I expect to move to a new server sometimes in the comming week. Will annonce the date, when I know, when it will happen.

EDIT: Spoke a little bit too soon, should be fixed now though.

EDIT2:

There was something that kept using storage, so ran into the issue again. Then the volume/storage for the image service(pictrs) stopped worked for some unknown reason(Thankfully, I have backups) and there shouldn't be any images lost.

Good news is that I have reclaimed a lot of storage, so shouldn't be in danger of running out of space for a long time.

20
 
 

While preparing for migration to a new host, I had to setup the db, but during that I deleted a resource in k8s to force a reload of settings in the db. This caused the db to use a different volume and it took a bit before I could revert it back to using the old volume.

No data should have been lost. Let me know, if anything is missing.

21
 
 

Same thing as yesterday.

22
 
 

Unfortunatly the tool for scanning CSAM didn't detect the image, so to ensure there are no CSAM on the server, images for the last 8 hours has been deleted.

23
 
 

The filesystem manager(longhorn) I use reported that multiple volumes were faulted. This caused the site to go down.

I have no idea why the volumes faulted, only that a reboot of the server fixed it. Hopefully this was a strange one off and it doesn't occur again.

24
 
 

To make it easier to deferate from unwanted instances I have switched to using the fediseer. With this tool I can get censures from other trustworthy instances. A censure is a "completely subjective negative judgement"(see more here) and reasons for the censure can be listed.

Currently I'm using the censures from lemmy.dbzer0.com(Can be seen here), that has any of the following reasons for the censure

  • Lolicon
  • CSAM
  • Fascism
  • Hate speech
  • Bigotry
  • Pedophilia
  • Bestiality
  • MAP

I will still manually deferate from instances, when it is needed, but this makes easier to deferate bad instances I would have missed/didn't know about.

Note: The automated deferation also includes spam instances, which is currently defined by

  • More than 30 registered users per local post + comments
  • More than 500 registered users per active monthly user.
25
 
 

The main reason is that they allow/support pedophilia, but they also allow zoophilia and biastophilia. They try to label it as MAP(minor-attracted person), but it is still pedophilia. Example of MAP post here.

view more: next ›