Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

non deterministic NIC order in multihomed instance with both static and dynamic network #2596

Open
gberche-orange opened this issue Jan 13, 2025 · 2 comments

Comments

@gberche-orange
Copy link

Describe the bug

we're observing that some bosh deployments with an instance groups having two networks, end up having non determinist NIC order allocation (between eth0 and eth1) at initial deployment, on the index 1. Then, this order remains the same after any bosh recreate of the instance group.

This impacts the following use-cases which can't rely on a deterministic interface name (e.g. eth0) or a given network

keepalived.interface:
description: interface keepalived will use to mount the VIP. If set to 'auto', uses the default interface on the VM
default: auto

  • (We're fetching history and retesting to understand why we have disabled auto)

We have been comparing logs of two bosh deployments with the same manifest network specs such as the following (including cloud-config network ordering):

  name: proxy
  instances: 2
  networks:
    - name: tf-net-osb-data-plane-shared-pub2
      static_ips:
        - 10.xx.yy.189
        - 10.xx.yy.190
    - default:
        - dns
        - gateway
      name: tf-net-osb-data-plane-shared-priv
  stemcell: default 

The difference in logs during a bosh recreate is limited to the

  • DEBUG -- DirectorJobRunner: Fetching existing instance for: #<Bosh::Director::Models::Instance @values= which shows that the current instance networks are fetched from the agent settings and returned with a different order
    • the agent_settings.json have indeed a different order in the two instances of the instance group
  • Creating instance network reservations from database for instance (See sources) which list the ip_addresses in a different order
  • cpi call and response to create_vm which have network in different order

Looking into the bosh database instances table, the spec_json have a diverging order of networks for the two instances.

Is there a way to make the network interface assignment (eth0/eth1) deterministic for a new deployment ?

Thanks in advance for your help !

To Reproduce

See above manifest fragment that triggered the problem

Steps to reproduce the behavior (example):

  1. Deploy a bosh director on vsphere-cpi
  2. Deploy
  3. Check eth0/eth1 ordering

Expected behavior

Systematic determinist ordering of eth0/eth1

Versions (please complete the following information):

  • Infrastructure: vsphere 97.0.15
  • BOSH version: 280.1.5
  • Stemcell version '1.631'

/CC @ogrand

@gberche-orange
Copy link
Author

Related slack thread , thanks to @ramonskie https://cloudfoundry.slack.com/archives/C02HPPYQ2/p1736847319592899

Guillaume Berche
Yesterday at 10:35 AM
Hi bosh friends, we're observing at Orange non deterministic NIC name allocation (eth0 vs eth1) in multi-homed instance groups. We detailed the symptoms and a first analysis in #2596 Has anyone observed the same issue ? Although I did not find explicit documentation, we were assuming a deterministic behavior by bosh, when creation/deleting a deployment multiple times. Would there be interest in diagnosing and fixing this non deterministic behavior ?
10 replies

rmakkelie
Yesterday at 11:00 AM
not sure if this is relevant.
but we already set a kernel parameter to keep the old network interface naming with net.ifnames=0 in the https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/ubuntu-jammy/stemcell_builder/stages/image_install_grub/apply.sh#L92
as the default network interface naming has changed since ubuntu bionic (18.04)
there is a way to keep with each mac adress the same interface name with udev rules. but we don't do this within bosh.

Guillaume Berche
Yesterday at 11:03 AM
Thanks. In this specific case, we properly have eth0 and eth1 naming, but their ordering/assignement between the two networks is'nt deterministic. (edited)
11:04
This suprisingly reproduces on the instance/1 and not on instance/0 (although we don't have yet much statistical data over deletion/creation of this specific instance group/deployment). (edited)

rmakkelie
Yesterday at 11:05 AM
well the problem here is without ifnames=o its even worse.
as than you have actauly no idea whats going to happen 😞
i do not think that default linux networking was inteded with determenistic naming schema
11:07
we are also still using ipupdown and not netplan
11:10
https://wiki.debian.org/NetworkInterfaceNames

Guillaume Berche
Yesterday at 11:13 AM
Thanks for this background, I'll study this in more details, and we'll study changing our assumption about deterministic interface name and rather adapt our bosh release to discover the interface names

rmakkelie
Yesterday at 11:14 AM
chaning it to the predecitble naming scheme is going to be a big task. as its baked in to the stemcell so we can't be backwards compatible and it seemed that allot of releases are using the eth0 naming schema
11:15
also take a look at biosdevname

schmidtsv
Yesterday at 3:01 PM
https://wiki.archlinux.org/title/Network_configuration#Change_interface_name you could make files for networkd and then restart it probably to get the names you desire, or try to match by PCI id

@beyhan beyhan moved this from Inbox to Waiting for Changes | Open for Contribution in Foundational Infrastructure Working Group Jan 16, 2025
@beyhan
Copy link
Member

beyhan commented Jan 16, 2025

We don't plan to invest into this but we would love to review a pr which can make this deterministic in a backwards compatible way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Waiting for Changes | Open for Contribution
Development

No branches or pull requests

2 participants