Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No space left on device" but I have plenty of space left on device #13

Open
samuela opened this issue Feb 9, 2024 · 5 comments
Open

Comments

@samuela
Copy link
Contributor

samuela commented Feb 9, 2024

I was getting this weird error:

[nix-shell:~/nixpkgs]$ nixglhost ipython
Traceback (most recent call last):
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 681, in <module>
    ret = main(args)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 629, in main
    new_env = nvidia_main(cache_dir, host_dsos_paths, args.print_ld_library_path)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 564, in nvidia_main
    cache_paths.append(cache_library_path(p, tmp_cache_dir, cache_dir))
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 449, in cache_library_path
    copy_and_patch_libs(dsos=dsos, dest_dir=d, rpath=rpath_lib_dir)
  File "/nix/store/46qxb828xwxn8rgrhq7crdmanwf59fjk-nix-gl-host-0.1/bin/nixglhost", line 331, in copy_and_patch_libs
    shutil.copyfile(dso.fullpath, newpath)
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 267, in copyfile
    _fastcopy_sendfile(fsrc, fdst)
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 156, in _fastcopy_sendfile
    raise err from None
  File "/nix/store/zdba9frlxj2ba8ca095win3nphsiiqhb-python3-3.10.8/lib/python3.10/shutil.py", line 142, in _fastcopy_sendfile
    sent = os.sendfile(outfd, infd, offset, blocksize)
OSError: [Errno 28] No space left on device: '/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.545.23.08' -> '/run/user/1000/tmpzgwq37ed/nix-gl-host/3076b0246cb1468199a8444860ebaebe5ec5081f85098057a9f4d5b40c3de738/lib/libnvidia-eglcore.so.545.23.08'

[nix-shell:~/nixpkgs]$ df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         31G   25G  6.8G  79% /
devtmpfs         7.7G     0  7.7G   0% /dev
tmpfs            7.7G     0  7.7G   0% /dev/shm
tmpfs            3.1G  924K  3.1G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/loop0        25M   25M     0 100% /snap/amazon-ssm-agent/7628
/dev/loop1        56M   56M     0 100% /snap/core18/2812
/dev/loop2        64M   64M     0 100% /snap/core20/2105
/dev/loop3        87M   87M     0 100% /snap/lxd/26881
/dev/loop4        87M   87M     0 100% /snap/lxd/26975
/dev/loop5        41M   41M     0 100% /snap/snapd/20671
/dev/nvme0n1p15  105M  6.1M   99M   6% /boot/efi
tmpfs            1.6G   20K  1.6G   1% /run/user/1000

Oddly, it seems to have gone away now, despite not having changed my configuration. Not sure what might be going on, or how to reproduce consistently, but hoping that creating this issue will help to spur discussion.

@samuela samuela changed the title "No space left on device", even though I have plenty of space left on device "No space left on device" but I have plenty of space left on device Feb 9, 2024
@picnoir
Copy link
Member

picnoir commented Feb 9, 2024

We're using a tmp directory to build the lib cache before moving it to the definitive cache dir. See https://github.com/numtide/nix-gl-host/blob/main/src/nixglhost.py#L558

So potentially, in your case, the tmpdir is created in your /run/user/1000 directory, which is rather small (1.6G). Copying the libs there saturates the tmpfs, hence the error. This tmpdir is then deleted when nix-gl-host exits, emptying it again.

I guess a potential fix would be to check we have enough available space in the tmpfs, and if we don't, use another directory in ~/.cache (XDG_CACHE_HOME).

@samuela
Copy link
Contributor Author

samuela commented Feb 9, 2024

Ah, gotcha. thanks for explaining @picnoir ! Is there any particular reason to use /run/user/1000 instead of /tmp? Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?

@picnoir
Copy link
Member

picnoir commented Feb 9, 2024

Is there any particular reason to use /run/user/1000 instead of /tmp?

We're using TemporaryDirectory, which in turn uses the mkdtemp glibc function, which in turn will use your $TMPDIR var to figure out where to store the temporary directories.

I assume you could try to set $TMPDIR to /tmp to get this behavior.

Also why copy libs instead of symlinking them? Copying seems slower and more error-prone?

We need to patch their rpath. More details there: https://github.com/numtide/nix-gl-host/blob/main/INTERNALS.md#a-hard-problem-to-solve-and-a-partial-fix

@samuela
Copy link
Contributor Author

samuela commented Feb 9, 2024

ahhh gotcha, ok thanks!

@leonardschneider
Copy link

export TMPDIR=/tmp worked for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants