22
submitted 3 months ago* (last edited 3 months ago) by PlutoniumAcid@lemmy.world to c/selfhosted@lemmy.world

I run an old desktop mainboard as my homelab server. It runs Ubuntu smoothly at loads between 0.2 and 3 (whatever unit that is).

Problem:
Occasionally, the CPU load skyrockets above 400 (yes really), making the machine totally unresponsive. The only solution is the reset button.

Solution:

  • I haven't found what the cause might be, but I think that a reboot every few days would prevent it from ever happening. That could be done easily with a crontab line.
  • alternatively, I would like to have some dead-simple script running in the background that simply looks at the CPU load and executes a reboot when the load climbs over a given threshold.

--> How could such a cpu-load-triggered reboot be implemented?


edit: I asked ChatGPT to help me create a script that is started by crontab every X minutes. The script has a kill-threshold that does a kill-9 on the top process, and a higher reboot-threshold that ... reboots the machine. before doing either, or none of these, it will write a log line. I hope this will keep my system running, and I will review the log file to see how it fares. Or, it might inexplicable break my system. Fun!

you are viewing a single comment's thread
view the rest of the comments
[-] BestBouclettes@jlai.lu 17 points 3 months ago* (last edited 3 months ago)

Just so you know, the load avg is not actually the CPU load. It's an index of a bunch of metrics crammed together (network load, disk I/o, CPU avg, etc.). A good rule of thumb is to have your load avg value under the number of cores your CPU has. If your load avg is twice the number of your CPU cores it means that your machine is overloaded by 100%, if it's equal to your number of cores, your machine is using 100% of its capacity to treat whatever you're throwing at it.

To answer your question, you can probably run a script that fetches your 5 min load avg and triggers a reboot if it's higher than a certain value. You can run it on a regular basis with a systemd timer or a cron job.

[-] raldone01@lemmy.world 10 points 3 months ago* (last edited 3 months ago)

Disk IO can cause rediculous load averages. The highest one I have seen:

high load

My HDDs were sweating that day. Turns out running btrfs defrag once a blue moon is a good idea...

this post was submitted on 15 Mar 2024
22 points (84.4% liked)

Selfhosted

37779 readers
377 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 1 year ago
MODERATORS