Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ebsnvme-id creates broken sd* symlinks #37

Open
martinpitt opened this issue May 29, 2024 · 13 comments
Open

ebsnvme-id creates broken sd* symlinks #37

martinpitt opened this issue May 29, 2024 · 13 comments

Comments

@martinpitt
Copy link

martinpitt commented May 29, 2024

We spent quite some time debugging a storage test regression in Fedora rawhide which essentially breaks scsi_debug and other devices, but only on RedHat's/Fedora's Testing Farm infrastructure -- which is essentially AWS EC2 machines with an API.

Latest Fedora rawhide instances now have amazon-ec2-utils-2.2.0-2.fc41.noarch (which got introduced into Fedora very recently), which ships /usr/lib/udev/rules.d/70-ec2-nvme-devices.rules with

KERNEL=="nvme[0-9]*n[0-9]*",        ENV{DEVTYPE}=="disk",      ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c"
KERNEL=="nvme[0-9]*n[0-9]*p[0-9]*", ENV{DEVTYPE}=="partition", ATTRS{model}=="Amazon Elastic Block Store", PROGRAM="/usr/sbin/ebsnvme-id -u /dev/%k", SYMLINK+="%c%n"

These instances have an NVME block device, and these rules cause the following symlinks to be created:

lrwxrwxrwx. 1 root root 7 May 29 03:52 /dev/sda1 -> nvme0n1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda11 -> nvme0n1p1
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda12 -> nvme0n1p2
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda13 -> nvme0n1p3
lrwxrwxrwx. 1 root root 9 May 29 03:52 /dev/sda14 -> nvme0n1p4

This is problematic in multiple ways:

  • Pretending that these are SCSI drives tramples on the kernel's namespace. udev symlinks should never create names which the kernel uses.
  • "nvme0n1" is the raw block device, not a partition. So it's very confusing to name it "sda1", it should be "sda". Likewise, the first partition should be "sda1", not "sda11".

If then a real sda comes along (e.g. with modprobe scsi_debug), this will create an actual /dev/sda, but then it's impossible to create/see partitions on that, as the sda1 etc. names are already taken.

This is most easily reproduced with

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1
sda1

Curiously, it also does that for a partition:

# /usr/sbin/ebsnvme-id -u /dev/nvme0n1p2
sda1

that explains how the second udev rule can even work -- but this is really hackish!

My recommendation as former udev co-upstream is to just entirely remove these rules. They are not helpful, confusing, and break stuff. You can of course create symlinks in subdirs of /dev all you like, but please don't collide with kernel names.

Thanks!

@martinpitt
Copy link
Author

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt
Copy link
Author

Note: This only affects Fedora rawhide because Testing Farm Fedora 40 instances don't install amazon-ec2-utils by default. When I install it manually, the issue happens there as well.

martinpitt added a commit to martinpitt/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes cockpit-project#20520
@mvollmer
Copy link

@martinpitt, thanks for filing this! I have a hard time understanding what problem these symlinks are trying to solve. They only seem to create chaos.

If they are supposed to help with giving stable names to NVMe drives, I think that problem is already solved by ID_SERIAL, ID_WWN, and filesystem UUIDs.

@martinpitt
Copy link
Author

https://gitlab.com/testing-farm/infrastructure doesn't actually install that package -- I figure it's now part of the official Fedora rawhide AMIs?

martinpitt added a commit to cockpit-project/cockpit that referenced this issue May 29, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
@major
Copy link

major commented May 29, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

@martinpitt That would be helpful. Thanks for detailing out the problems you found. I missed these during testing!

@martinpitt
Copy link
Author

@major OK, I filed https://bugzilla.redhat.com/show_bug.cgi?id=2284397 . Thanks!

@tbzatek
Copy link

tbzatek commented Sep 23, 2024

@mvollmer @major FYI -- @major, do you want me to file this as a Fedora bz, too? This could very well affect other rawhide users/tests, and it has already cost us about 10 hours of our lives..

This has cost mine and @vojtechtrefny's an hour or two of our lives as well: https://bugzilla.redhat.com/show_bug.cgi?id=2313526

cowboyox pushed a commit to cowboyox/cockpit that referenced this issue Oct 8, 2024
Rawhide Testing Farm machines started to get a set of symlinks like
/dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils
(amazonlinux/amazon-ec2-utils#37).

They break `scsi_debug`, as that creates /dev/sda -- but then trying to
create partitions on it doesn't have any namespace room for /dev/sda1
etc., as that is already taken. This breaks all storage tests which use
a RAM disk.

That package isn't yet installed in Fedora 39/40, only rawhide. We don't
need it and it only causes trouble → kann weg.

Fixes #20520
@xnox
Copy link

xnox commented Jan 17, 2025

@dlouzan
Copy link

dlouzan commented Feb 14, 2025

@mvollmer @martinpitt The main problem with completely removing the custom amazonlinux udev ebs rules is that they prevent having stable names for the dynamically mounted volumes. In our setup, with terraform we assign /dev/XXX names when creating the image and then instances, which would then be mapped automatically on startup from the dynamic nvmeXXX names to the names we assigned in terraform.

In f40 (before this package was also backported natively), we were just installing the amazonlinux2023 source repository for this specific package, and everything worked.

We've just recently started making tests of our automated setup with f41, and the newly bundled fedora package is causing the setup to fail. The nvme numbering is not guaranteed with multiple volumes, on one boot you might get the root filesystem on nvme0 and on the next on nvme1.

It looks like the project mentioned above by @xnox is modifying the rules to have stable /dev/xvXXX names, which would also work for our case. Unless I'm missing anything obvious, how does this work otherwise? 🙇

/cc @fh1ch

@martinpitt
Copy link
Author

@dlouzan I assume with "/dev/XXX" you mean something more specific like "/dev/sda1". But how is /dev/sda1 any more specific than /dev/nvme0n1? Both are a dynamically numbered namespace. But anyway: the general idea that ebsnvme-id computes stable aliases is ok. Builtin udev rules already do this, with /dev/disk/by-id, by-uuid, by-label, by-path and so on -- i.e. depending on how you want to identify the device (by location, by content, explicity named) one of these is usually right. So if ebsnvme-id can assign symlinks which are derived from some property that you set in the EC2 API, that's completely fine.

But it must not trample on the kernel's namespace and pretend that they are /dev/sdXY (and on top of that do that wrongly). That specific naming is the grave bug, not the general idea.

@dlouzan
Copy link

dlouzan commented Feb 15, 2025

@martinpitt Do not get me wrong, I'm not challenging the fact that the current mapping is wrong and it needed some fixing. But I think the current approach of f41 native package is lacking, as it removes useful functionality when running in AWS, and it should be extended (probably modified here and then adapted in f41's package).

Most probably the approach used above that guarantees a stable name such as /dev/xvdb would be the way to go.

@martinpitt
Copy link
Author

@dlouzan I see -- to be honest I don't know what the F41 package does -- we've carried a workaround in our tests that just removes the udev rule file wholesale ever since.

@xnox
Copy link

xnox commented Feb 17, 2025

@dlouzan are you able to use any of the stable /by-id/ paths?

 ls -l /dev/disk/by-id/ | grep Elastic
lrwxrwxrwx. 1 root root 13 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185 -> ../../nvme0n1
lrwxrwxrwx. 1 root root 13 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-ns-1 -> ../../nvme0n1
lrwxrwxrwx. 1 root root 15 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-ns-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-ns-1-part127 -> ../../nvme0n1p127
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-ns-1-part128 -> ../../nvme0n1p128
lrwxrwxrwx. 1 root root 15 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-part127 -> ../../nvme0n1p127
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185-part128 -> ../../nvme0n1p128
lrwxrwxrwx. 1 root root 13 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185_1 -> ../../nvme0n1
lrwxrwxrwx. 1 root root 15 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185_1-part1 -> ../../nvme0n1p1
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185_1-part127 -> ../../nvme0n1p127
lrwxrwxrwx. 1 root root 17 Feb  8 00:52 nvme-Amazon_Elastic_Block_Store_vol0464842bc21e25185_1-part128 -> ../../nvme0n1p128

Or is that not the right type of name?

Separately I will check what is available in Bottlerocket, as I think it tries to transfer over and create aliases for named volumes in a more human/stable meaningful way.

Note that all disk mappings are ignored for nvme drives by AWS EC2, even when specified in AMI registration as sda1. As mentioned in the documentation.

From https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2/register-image.html:

NVMe instance store volumes are automatically enumerated and assigned a device name. Including them in your block device mapping has no effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants