Self Hosting Fail

Padook@feddit.nl · 2 years ago

Self Hosting Fail

Kuinox@lemmy.world · edit-2 2 years ago

When you are bored, backup a VM then hard kill it and see if it manage to restart properly.
Software should be able to recover from that.
If it doesn’t, troubleshoot.

Deebster@programming.dev · 2 years ago

That reminds me of Netflix’s Chaos Monkey (basically in office hours this tool will randomly kill stuff).

BlackAura@lemmy.world · 2 years ago

When I built my home server this is what I did with all VMs. Learned how to change the start up delay time in esxi and ensured everything came back online with no issues from a cold built.

Rip VMware.

pezhore@lemmy.ml · edit-2 2 years ago

While I appreciate the sentiment, most traditional VMs do not like to have their power killed (especially non-journaling file systems).

Even crash consistent applications can be impacted if the underlying host fs is affected by power loss.

I do think that backup are a valid suggestion here, provided that the backup is an interrupted by a power surge or loss.

Edit: even journaling file systems aren’t a magic bullet. I’ve had an ext4 fs get corrupted when IO was interrupted by power loss. I get the down votes for mentioning non-journaling FS, but seriously folks, use the swiss cheese method of protecting your stuff… backups, redundant power/UPS, documented/automated installation/configuration.

taladar@sh.itjust.works · 2 years ago

most traditional VMs do not like to have their power killed (especially non-journaling file systems).

Why are you using a non-journaling file system in 2024 when those were common 10+ years ago?

CameronDev@programming.dev · 2 years ago

Did the services fail to come back due to the bad reboot, or would they have failed to come back on a clean reboot? I ugly reboot my stuff all the time, and unless the hardware fails, i can be pretty sure its all going to come back. Getting your stuff to survive reboot is probably a better spend of effort.

Padook@feddit.nl · 2 years ago

I didn’t mean to imply that Services actually broke. Only that they didn’t come back after a reboot. A clean reboot may have caused some of the same issues because, I’m learning as I go. Some services are restarted by systemctl, some by cron, some…manual. This is certainly a wake up call that I need standardize and simplify the way the services are started.

CameronDev@programming.dev · 2 years ago

We’ve all.committed that sin before. Its better to rely on it surviving the reboot than to try prevent the reboot.

Also worth looking into some form of uptime monitoring software. When something goes down, you want to know about it asap.

And documenting your setup never hurts :D

nimmo@lem.nimmog.uk · 2 years ago

On the uptime monitoring I’ve been quite happy with uptime kuma, but… If you put it on the same host that’s down… Well, that’s not going to work :p (I nearly made that mistake)

elvith@feddit.de · 2 years ago

It’s not the most detailed thing, but I just use a free account on cron-job.org to send a head request every two minutes to a few services that are reachable from the internet (either just their homepage or some ping endpoint in the API) and then used the status page functionality to have a simple second status page on a third party server.

You can do a bit more on their paid tier, but so far I didn’t need that.

On the other hand, you could try if a free tier/cheap small vps on one of the many cloud providers is sufficient for an uptime Kuma installation. Just don’t use the same cloud provider as all other of your services run in.

lemmyvore@feddit.nl · 2 years ago

IMHO you’re optimizing for the wrong thing. 100% availability is not something that’s attainable for a self-hoster without driving yourself crazy.

Like the other comment suggested, I’d rather invest time into having machines and services come back up smoothly after reboots.

That being said, an UPS may be relevant to your setup in other ways. For example it can allow a parity RAID array to shut down cleanly and reduce the risk of write holes. But that’s just one example, and an UPS is just one solution for that (others being ZFS, or non-parity RAID, or SAS/SATA controller cards with built-in battery and/or hardware RAID support etc.)

Static_Rocket@lemmy.world · 2 years ago

I present to you the holy hardware compatibility table:

https://networkupstools.org/stable-hcl.html

Anything not listed there is not worth buying.

catloaf@lemm.ee · edit-2 2 years ago

A lot of stuff on there isn’t worth buying either, like anything from APC. If you want good stuff, just get Eaton.

But also you have to understand that UPSes aren’t set and forget. The batteries need replacement every 3-5 years. And they’re not for extended outages, they’re mostly to bridge the gap between mains power going out and a generator starting up.

Personally I just have everything running from docker-compose, so I run one command and everything not running gets started. I don’t worry about stuff being down for a bit.

calm.like.a.bomb@lemmy.dbzer0.com · 2 years ago

What’s wrong with APC? I have one for 6-7 years. I’ve changed the battery once and I think I’ll have to change it again this year. I didn’t have any problems with it.

Jo Miran@lemmy.ml · 2 years ago

A UPS should always be your first or second purchase if only for power conditioning and brown-out protection.

jkrtn@lemmy.ml · 2 years ago

They will do power conditioning? My modem is such a sensitive baby I cannot plug anything else in next to it or it starts dropping packets. Would a UPS help with that? Unfortunately I cannot replace the modem, that’s the only one the ISP will give me.

catloaf@lemm.ee · edit-2 2 years ago

Yes. An online/double-conversion UPS will be the most effective, because it actually runs off the battery the whole time, so it’s disconnected from any line quality issues.

A line-interactive UPS is cheaper, but doesn’t do full power conditioning.

An offline UPS doesn’t do any at all, only comes online when power drops.

https://community.fs.com/article/line-interactive-vs-online-vs-offline-ups.html