this post was submitted on 16 Apr 2024
14 points (93.8% liked)

Selfhosted

40218 readers
1015 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS
 

Fine folks of c/selfhosted, I've got a Docker LXC (Debian) running in Proxmox that loses its local network connection 24 hours after boot. It's remedied with a LXC restart. I am still able to access the console through Proxmox when this happens, but all running services (docker ps still says they're running) are inaccessible on the network. Any recommendations for an inexperienced selfhoster like myself to keep this thing up for more than 24 hours?

Tried:

  • Pruning everything from Docker in case it was a remnant of an old container or something.
  • Confirming network config on the router wasn't breaking anything.
  • Checked there were no cron tasks doing funky things.

I did have a Watchtower container running on it recently, but have since removed it. It being a 24 hr thing got me thinking that was the only thing that would really cause an event at the 24 hr post start mark, and it started about that same time I removed Watchtower (intending to do manual updates because immich).

...and of course, any fix needs 24 hours to confirm it actually worked.

A forum post I found asked for the output of ip a and ip r, ~~see below.~~ Notable difference on ip r missing the link to the gateway after disconnecting.

Update: started going through journalctl and found the below abnormal entries when it loses connection, now investigating to see if I can find out why...

Apr 16 14:09:16 docker 922abd47b5c5[376]: [msg] Nameserver 1.1.1.1:53 has failed: request timed out.
Apr 16 14:09:16 docker 922abd47b5c5[376]: [msg] Nameserver 192.168.1.5:53 has failed: request timed out.
Apr 16 14:09:16 docker 922abd47b5c5[376]: [msg] All nameservers have failed

Update 2: I found using systemctl status networking.service that networking.service was in a failed state (Active: failed (Result: exit-code)). I also compared to a separate stable Docker LXC which showed networking.service was active, so, did some searching to remedy that.

x networking.service - Raise network interfaces
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)
     Active: failed (Result: exit-code) since Tue 2024-04-16 17:17:41 CST; 8min ago
       Docs: man:interfaces(5)
    Process: 20892 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
    Process: 21124 ExecStopPost=/usr/bin/touch /run/network/restart-hotplug (code=exited, status=0/SUCCESS)
   Main PID: 20892 (code=exited, status=1/FAILURE)
        CPU: 297ms

Apr 16 17:17:34 docker dhclient[20901]: DHCPACK of 192.168.1.104 from 192.168.1.1
Apr 16 17:17:34 docker ifup[20901]: DHCPACK of 192.168.1.104 from 192.168.1.1
Apr 16 17:17:34 docker ifup[20910]: RTNETLINK answers: File exists
Apr 16 17:17:34 docker dhclient[20901]: bound to 192.168.1.104 -- renewal in 37359 seconds.
Apr 16 17:17:34 docker ifup[20901]: bound to 192.168.1.104 -- renewal in 37359 seconds.
Apr 16 17:17:41 docker ifup[20966]: Could not get a link-local address
Apr 16 17:17:41 docker ifup[20892]: ifup: failed to bring up eth0
Apr 16 17:17:41 docker systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
Apr 16 17:17:41 docker systemd[1]: networking.service: Failed with result 'exit-code'.
Apr 16 17:17:41 docker systemd[1]: Failed to start networking.service - Raise network interfaces.

A reinstall of net-tools and ifupdown seems to have brought networking.service back up. apt-get install --reinstall net-tools ifupdown

Looking at the systemctl status return, I bet everything was fine until dhclient/ifup requested renewal about 24 hours after initial connection (boot), found that networking.service was down, and couldn't renew, killing the network connection.

We'll see if it's actually fixed in 24 hours or so, but hopefully this little endeavour can help someone else plagued with this issue in the future. I'm still not sure exactly what caused it. I'll confirm tomorrow...

Update 3 - Looks like that was the culprit. Container is still connected 24+ hrs since reboot, network.service is still active, and dhclient was able to renew.

Update 4 - All was well and good until I started playing with setting up Traefik. Not sure if this brought it to the surface or if it just happened coincidentally, but networking.service failed again. Tried restarting the service, but it failed. Took a look in /etc/networking/interfaces and found there was an entry for iface eth0 inet6 dhcp and I don't use ipv6. Removed that line and networking.service restarted successfully. Perhaps that was the issue the whole time.

top 18 comments
sorted by: hot top controversial new old
[–] lemmyreader@lemmy.ml 5 points 7 months ago* (last edited 7 months ago) (1 children)

Are you running Docker from within Proxmox or next besides Proxmox ? I am not familiar with Proxmox these days, but I do know that standalone Docker + lxc can make the network of the lxc containers time out due to Docker iptables setup.

[–] OminousOrange@lemmy.ca 4 points 7 months ago

Docker is installed on a Debian container with Proxmox as the hypervisor. I believe as far as Docker knows, it's just running on normal Debian. The Debian LXC has its own local ip.

I'll take a look at those resources though, thanks.

[–] emptiestplace@lemmy.ml 2 points 7 months ago (4 children)

Why are you doing this? Unless you have a good reason, you should probably either run Docker in a vm, or use something other than Proxmox where you can just install Docker on the host system.

[–] skittlebrau@lemmy.world 5 points 7 months ago* (last edited 7 months ago) (2 children)

Not OP, but I run Docker in LXC because my Proxmox host is an Intel NUC and I only have one graphics card (integrated).

I don’t want to passthrough the iGPU to a VM because then I lose video output for the host. I also don’t want to use SR-IOV for iGPU because it’s buggy and results in garbled output for HDR content. That’s why, in my case, Docker in LXC makes sense.

Obviously if I had a choice, I would prefer to do Docker in a VM with a dedicated GPU passed through.

I’ve done Docker in LXC for about a year and it’s been fine. Not perfect and not as secure as a VM, but it suits my homelab.

[–] OminousOrange@lemmy.ca 2 points 7 months ago

This is mostly my reasoning too. I've got a bit more juice than a NUC, but I prefer the way resources are managed with an LXC for the certain apps that I run. I still have VMs for other things, like HAOS and a BlueIris NVR. It's only a local homelab with no external users so avoiding additional complexity is often in my best interest.

[–] emptiestplace@lemmy.ml 2 points 7 months ago (1 children)

Why not just use LXC though?

[–] skittlebrau@lemmy.world 1 points 7 months ago

Ideally I would, but I don’t have the time to manage 30 different containers.

When I didn’t have kids, I ran everything in separate LXCs. I decided to just move everything to Docker and move on with my life.

[–] thirdBreakfast@lemmy.world 3 points 7 months ago (1 children)

My 'good reason' is just that it's super convenient - for backups and painlessly moving apps around between nodes with all their data.

I would run plain LXCs if people nicely packaged up their web apps as LXC templates and made them available on LXCHub for me to run with lxc compose up, but they generally don't.

I guess another alternate future would be if Proxmox added docker container supervision to their web interface, but you're still not going to have the self-contained neat snapshot system that includes the data.

In theory you should be able to convert an OCI container layer by layer into an LXC, so I bet there's projects out there that attempt this.

[–] markstos@lemmy.world 2 points 7 months ago

It's convenient when it works, but with three different containerization technologies, it's harder to debug when Proxmox+LXC+Docker fails. Even running Docker in parallel to LXC rather than nested would be simpler.

[–] pyrosis@lemmy.world 2 points 7 months ago

Usually this comes down to resource and energy efficiency. While a vm works perfectly fine you will find you can share video and storage resources in efficient ways with lxc.

For example you can directly pass a zfs dataset into a lxc with a simple lxc.mount.entry:

This would allow you to configure options like cluster size, atime, compression algorithm, xattr, etc.. without much overhead.

It's also nice to know you can share your GPU with multiple lxc without it being locked into a single vm.

[–] OminousOrange@lemmy.ca 2 points 7 months ago (1 children)

Why would one prefer a VM over an LXC for Docker?

[–] emptiestplace@lemmy.ml 2 points 7 months ago (1 children)
  • Kernel isolation
  • HA
  • Stability
  • It works
[–] markstos@lemmy.world 2 points 7 months ago (2 children)

LXC/LXD can be highly available (HA), stable, work and provide kernel isolation as well (real VMs): https://ubuntu.com/blog/lxd-virtual-machines-an-overview

[–] emptiestplace@lemmy.ml 1 points 7 months ago

In Proxmox?

[–] skittlebrau@lemmy.world 1 points 7 months ago

There are some quirks with docker in LXC. Nothing that can’t be overcome, but docker in a VM is definitely more stable.

[–] Decronym@lemmy.decronym.xyz 1 points 7 months ago* (last edited 7 months ago)

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I've seen in this thread:

Fewer Letters More Letters
HA Home Assistant automation software
~ High Availability
LXC Linux Containers
NUC Next Unit of Computing brand of Intel small computers
NVR Network Video Recorder (generally for CCTV)

4 acronyms in this thread; the most compressed thread commented on today has 7 acronyms.

[Thread #685 for this sub, first seen 17th Apr 2024, 07:05] [FAQ] [Full list] [Contact] [Source code]

[–] thirdBreakfast@lemmy.world 1 points 7 months ago (1 children)

No answer, but just to say I run most of my services with this setup - Docker in a Debian LXC under Proxmox, and don't have this issue. The containers are 'privileged', and I have 'nesting' ticked on, but apart from that all defaults.

[–] OminousOrange@lemmy.ca 3 points 7 months ago* (last edited 7 months ago)

I might have found the issue, see updates above. I have a separate Docker LXC that was behaving normally too, so was good to cross-check with that.