-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ebsnvme-id creates broken sd* symlinks #37
Comments
Note: This only affects Fedora rawhide because Testing Farm Fedora 40 instances don't install amazon-ec2-utils by default. When I install it manually, the issue happens there as well. |
Rawhide Testing Farm machines started to get a set of symlinks like /dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils (amazonlinux/amazon-ec2-utils#37). They break `scsi_debug`, as that creates /dev/sda -- but then trying to create partitions on it doesn't have any namespace room for /dev/sda1 etc., as that is already taken. This breaks all storage tests which use a RAM disk. That package isn't yet installed in Fedora 39/40, only rawhide. We don't need it and it only causes trouble → kann weg. Fixes cockpit-project#20520
@martinpitt, thanks for filing this! I have a hard time understanding what problem these symlinks are trying to solve. They only seem to create chaos. If they are supposed to help with giving stable names to NVMe drives, I think that problem is already solved by ID_SERIAL, ID_WWN, and filesystem UUIDs. |
https://gitlab.com/testing-farm/infrastructure doesn't actually install that package -- I figure it's now part of the official Fedora rawhide AMIs? |
Rawhide Testing Farm machines started to get a set of symlinks like /dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils (amazonlinux/amazon-ec2-utils#37). They break `scsi_debug`, as that creates /dev/sda -- but then trying to create partitions on it doesn't have any namespace room for /dev/sda1 etc., as that is already taken. This breaks all storage tests which use a RAM disk. That package isn't yet installed in Fedora 39/40, only rawhide. We don't need it and it only causes trouble → kann weg. Fixes #20520
@martinpitt That would be helpful. Thanks for detailing out the problems you found. I missed these during testing! |
@major OK, I filed https://bugzilla.redhat.com/show_bug.cgi?id=2284397 . Thanks! |
This has cost mine and @vojtechtrefny's an hour or two of our lives as well: https://bugzilla.redhat.com/show_bug.cgi?id=2313526 |
Rawhide Testing Farm machines started to get a set of symlinks like /dev/sda1 -> nvme0n1 (but *no* /dev/sda), via amazon-ec2-utils (amazonlinux/amazon-ec2-utils#37). They break `scsi_debug`, as that creates /dev/sda -- but then trying to create partitions on it doesn't have any namespace room for /dev/sda1 etc., as that is already taken. This breaks all storage tests which use a RAM disk. That package isn't yet installed in Fedora 39/40, only rawhide. We don't need it and it only causes trouble → kann weg. Fixes #20520
I see this was fixed in bottlerocket which uses ghostdog, but not in this repository as used by AL2023 and other distributions. https://github.com/bottlerocket-os/bottlerocket-core-kit/blob/develop/packages/os/ebs-volumes.rules |
@mvollmer @martinpitt The main problem with completely removing the custom amazonlinux udev ebs rules is that they prevent having stable names for the dynamically mounted volumes. In our setup, with terraform we assign In f40 (before this package was also backported natively), we were just installing the amazonlinux2023 source repository for this specific package, and everything worked. We've just recently started making tests of our automated setup with f41, and the newly bundled fedora package is causing the setup to fail. The nvme numbering is not guaranteed with multiple volumes, on one boot you might get the root filesystem on nvme0 and on the next on nvme1. It looks like the project mentioned above by @xnox is modifying the rules to have stable /cc @fh1ch |
@dlouzan I assume with "/dev/XXX" you mean something more specific like "/dev/sda1". But how is /dev/sda1 any more specific than /dev/nvme0n1? Both are a dynamically numbered namespace. But anyway: the general idea that But it must not trample on the kernel's namespace and pretend that they are /dev/sdXY (and on top of that do that wrongly). That specific naming is the grave bug, not the general idea. |
@martinpitt Do not get me wrong, I'm not challenging the fact that the current mapping is wrong and it needed some fixing. But I think the current approach of f41 native package is lacking, as it removes useful functionality when running in AWS, and it should be extended (probably modified here and then adapted in f41's package). Most probably the approach used above that guarantees a stable name such as |
@dlouzan I see -- to be honest I don't know what the F41 package does -- we've carried a workaround in our tests that just removes the udev rule file wholesale ever since. |
@dlouzan are you able to use any of the stable /by-id/ paths?
Or is that not the right type of name? Separately I will check what is available in Bottlerocket, as I think it tries to transfer over and create aliases for named volumes in a more human/stable meaningful way. Note that all disk mappings are ignored for nvme drives by AWS EC2, even when specified in AMI registration as sda1. As mentioned in the documentation. From https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2/register-image.html:
|
We spent quite some time debugging a storage test regression in Fedora rawhide which essentially breaks
scsi_debug
and other devices, but only on RedHat's/Fedora's Testing Farm infrastructure -- which is essentially AWS EC2 machines with an API.Latest Fedora rawhide instances now have amazon-ec2-utils-2.2.0-2.fc41.noarch (which got introduced into Fedora very recently), which ships /usr/lib/udev/rules.d/70-ec2-nvme-devices.rules with
These instances have an NVME block device, and these rules cause the following symlinks to be created:
This is problematic in multiple ways:
If then a real sda comes along (e.g. with
modprobe scsi_debug
), this will create an actual/dev/sda
, but then it's impossible to create/see partitions on that, as the sda1 etc. names are already taken.This is most easily reproduced with
Curiously, it also does that for a partition:
that explains how the second udev rule can even work -- but this is really hackish!
My recommendation as former udev co-upstream is to just entirely remove these rules. They are not helpful, confusing, and break stuff. You can of course create symlinks in subdirs of /dev all you like, but please don't collide with kernel names.
Thanks!
The text was updated successfully, but these errors were encountered: