this post was submitted on 11 Oct 2023
25 points (100.0% liked)

Announcements

177 readers
2 users here now

Announcements/Changelogs go here.

founded 1 year ago
MODERATORS
 

Do not ever push to production without testing things first. I went and moved us to the beta branch because this commit caught my eye and I wanted us to have the emoji picker fixed: https://github.com/LemmyNet/lemmy-ui/commit/ae4c37ed4450b194719859413983d6ec651e9609

The beta branch was on dockerhub so I thought that it was at least minimally tested. I was sorely mistaken. Lemmy (the backend) itself would load, but lemmy-ui (the actual website that renders everything) would keep crashing when attempting to load. I couldn't roll back because the database was migrated to this beta version and it couldn't migrate back to the old version when attempting to launch said old version.

I had no choice but to restore from backup. We've lost a whole day's worth of posts (anything after 3AM CST.) I'm really really sorry.. blobcatsadpleading

I was just so excited to be able to unveil this, I didn't take my time with actually testing it.

top 17 comments
sorted by: hot top controversial new old
[–] diff@burggit.moe 12 points 11 months ago* (last edited 11 months ago)

It's time to speedrun "repost everything and re-setup new community any% glitchless"! Here's hoping for a PB! ablobcathyper

[–] Burger@burggit.moe 9 points 11 months ago (1 children)

Also, to add: Thank fuck for Borg. It's one of the few backup solutions that hasn't corrupted itself just from doing incremental backups.

[–] Mousepad@burggit.moe 6 points 11 months ago

Their reverse order list empowers them!

[–] CookieJarObserver@burggit.moe 7 points 11 months ago (1 children)

Shit happens. One day is better than eight.

[–] Burger@burggit.moe 4 points 11 months ago (1 children)

Right. I thought I was restoring the most recent backup, but it turned out that the order for the archive list was in ascending order. I picked the top one thinking it was the latest. So that's why you saw 8 days when Burggit was back up for a split second.

My server automatically takes a backup once daily in the early AM, and keeps (I think? I'd need to check) 7 on hand. Any older than that, and they're pruned.

[–] CookieJarObserver@burggit.moe 4 points 11 months ago (1 children)

You should pick one day a week that is stored longer, if there is a repetitive issue it might corrupt over more than a few days.

Man i thought i did something wrong when the posts from the last week where gone

[–] Burger@burggit.moe 4 points 11 months ago (1 children)

Added a 1 per month backup retention. Thanks for the suggestion.

[–] CookieJarObserver@burggit.moe 3 points 11 months ago

Yeah had a problem once where a backup (made one once a day and keep 14) was overwritten by a corrupted data set and i basically lost most data, would be sad if that happens here if its avoidable👍

[–] Nazrin@burggit.moe 4 points 11 months ago (1 children)

Daily backups isn't too bad, though.

I would recommend backup right before upgrade too. make a sticky note for it if you always forget.

[–] Burger@burggit.moe 1 points 11 months ago (1 children)

See, I deliberately didn't do it before I did the upgrade because it'd cause downtime for 10-15mins I stop the server just so postgres can be in a consistent state and then the backup starts. And ofc we all know how that turned out, rofl. I clearly wasn't thinking.

[–] nickwitha_k@lemmy.sdf.org 2 points 11 months ago (1 children)

It's been a while since I've been admining DBs but, could you not use the WAL and PITR to do an online backup?

https://www.postgresql.org/docs/8.1/backup-online.html

[–] Burger@burggit.moe 3 points 11 months ago (1 children)

There's a whole separate server that's in charge of storing images (pict-rs) it uses its own database that isn't anything *SQL (sled.) I just think it's easier to use this solution. Everything's in-tact and the backup task runs in the wee hours of the morning. Besides, I couldn't get cron to call docker to execute a pg_dump if my life depended on it. Some shell environment fuckery is my guess. And I just don't want to mess with troubleshooting it because it's a hassle testing why something doesn't work with cron. This works. I'd rather not change it. It backs up everything, including all the images.

For a time when I was running this on my home box, Proxmox has a nifty backup tool that freezes the filesystem in place and takes a snapshot, then backs up said snapshot in a compressed tarball. It's dedicated too if you run proxmox backup server. This is a VPS though. Not a dedi. There's, of course LVM for taking snapshots, but I don't want to rip everything up and start over. Since this is running raw Ext4 with no volume management whatsoever.

[–] nickwitha_k@lemmy.sdf.org 2 points 11 months ago (1 children)

Makes sense. I just don't like folks having to do tedious work, if it isn't needed (also my job). For the cron, I might suspect path or permissions (most often these, in my experience). I find the easiest way to diagnose is to wrap the intended command in a bash script that writes stdout and stderr to files, acting like basic logs.

Glad you're back up and running!

[–] Burger@burggit.moe 2 points 11 months ago (1 children)

I guess I didn't clarify. I have a cron job that automatically turns the server off via docker compose and runs the Borg backup which seems to work perfectly. There's no manual intervention at all. I'm not manually turning it off and manually doing the backup.

I really appreciate your suggestions, though. I just don't want to touch something that saved my bacon already and risk having some workflow somewhere screw up without my being aware of it when I need to restore from backup. This probably will all be moot since moving to a dedi is a possibility in the future. So then I'd be able to use a hypervisor and just full on backup VM images.

[–] nickwitha_k@lemmy.sdf.org 3 points 11 months ago

Yeah. Makes sense - best to have something that you absolutely know works. Having the dedi will be really nice - having control of the hypervisor should let you avoid a lot of issues and make testing new updates easier (clone prod, update the clone, test on the clone, swap LB backend to point to the clone and drain old backend, hold old prod VM for a bit to make rollback quick, if needed).

[–] neo@lemmy.comfysnug.space 2 points 11 months ago

We have learned nothing, Burger.

[–] marisa1@burggit.moe 2 points 11 months ago

Well... I came back after this is solved very convient! lol