Hetzner worker nodes losing IPv4 with Ubuntu 24.04 #437

7oku · 2025-02-25T07:23:21Z

Similar to kubermatic/machine-controller#1587 we were hit by 2 complete outages of Hetzner cloud worker nodes recently again. We are at kubeone 1.9.0 with OSM 1.6.0.

I found out what exactly happens and it begins already with the deployment of the worker:

Hetzner deploys a machine with /etc/netplan/50-cloud-init.yaml present
On first boot, cloud-init invokes netplan generate and creates /run/systemd/network/10-netplan-eth0.network and 10-netplan-eth0.link files and systemd configures eth0 accordingly
OSM runs bootstrap script, which deletes /etc/netplan/50-cloud-init.yaml and disables cloud-init. Remember the file is gone now!
The machine is rebooted and Ubuntu 24.04 runs /etc/systemd/system-generators/netplan early in the boot process. This essentially invokes netplan generate again. Since there is no /etc/netplan/50-cloud-init.yaml any more, it also wipes /run/systemd/network/. I'm not sure if this is a thing in Ubuntu <=22.04 as well
The node proceeds to join cluster and everything is fine

As long as networking is not restarted, systemd will still maintain to manage eth0, i.e. handling dhcp, link and stuff.
But since Ubuntu does unattended upgrades by default, over time there will be packages upgraded which invoke a systemctl restart systemd-networkd. At that time, systemd will not manage eth0 any more as in 4. the files were wiped. This is still not an issue, as long as eth0 stays up.

But also recently, Hetzner has some weird network quirks. Links go down from time to time:

Feb 22 23:15:03 cluster-pool1-68546f7bb8-psz6n systemd-networkd[1800530]: lxc_health: Link DOWN
Feb 22 23:15:03 cluster-pool1-68546f7bb8-psz6n systemd-networkd[1800530]: lxc_health: Lost carrier

That's when everything goes south. The link is not managed any more and will not be taken up again, the node is gone. Control planes and any other instance sustain this by recovering the link, just not the worker nodes.

Possible solutions are:

a) Do not delete /etc/netplan/50-cloud-init.yaml. Since cloud init is disabled afterwards anyways, I see no problem in leaving it there. That being said without knowing your reason to remove it in the first place. Having the file still there would prevent any netplan generate runs from wiping out network config.

b) Disable the systemd generator, so it does not run netplan generate on boot. I'm not sure if this invokes any other issues later on.

ln -s /dev/null /etc/systemd/system-generators/netplan
systemctl daemon-reload

c) Disable unattended upgrades to prevent networking restarts. But I'd rather have them with a stable network config

Please check your worker nodes for existence of files in /run/systemd/network. If empty, you're most likely prone to outages.

@xrstf Coincidence you just modified the title of the rotten kubermatic/machine-controller#1587 exactly at the time of our first outage? You might noticed something similar?

The text was updated successfully, but these errors were encountered:

xrstf · 2025-02-25T08:48:36Z

@xrstf Coincidence you just modified the title of the rotten kubermatic/machine-controller#1587 exactly at the time of our first outage? You might noticed something similar?

I saw the ticket when looking through some board and its typo irked me for a long time. It's pure coincidence I randomly edited it recently :) Even if it wasn't, I would not publicly admit to my superpowers of remotely removing IPs from other people's servers.

7oku · 2025-02-25T09:19:49Z

@xrstf I couldn't find the ticket because I was searching for "loses" during the outage at that time. Just noticed you changed it shortly after and thought you experienced similar and changed it therefore. Didn't mean to make you responsible for our outage :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hetzner worker nodes losing IPv4 with Ubuntu 24.04 #437

Hetzner worker nodes losing IPv4 with Ubuntu 24.04 #437

7oku commented Feb 25, 2025

xrstf commented Feb 25, 2025

7oku commented Feb 25, 2025 •

edited

Loading

Hetzner worker nodes losing IPv4 with Ubuntu 24.04 #437

Hetzner worker nodes losing IPv4 with Ubuntu 24.04 #437

Comments

7oku commented Feb 25, 2025

xrstf commented Feb 25, 2025

7oku commented Feb 25, 2025 • edited Loading

7oku commented Feb 25, 2025 •

edited

Loading