Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] RuntimeError: Error building extension 'utils' (ninja related?) #2187

Closed
josephrocca opened this issue Aug 5, 2022 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@josephrocca
Copy link

Describe the bug
As shown in this notebook, I run these commands:

pip install deepspeed --upgrade
git clone https://github.com/microsoft/DeepSpeedExamples
cd DeepSpeedExamples/model_compression/gpt2
pip install -r requirements.txt
sudo apt-get install ninja-build # I don't think this line is actually needed, but I'm not sure
pip install ninja
bash ./bash_script/run_zero_quant.sh

This is exactly following the instructions in the readme of DeepSpeedExamples/tree/master/model_compression/gpt2 except that I had to install ninja because the machine didn't have it yet.

And after some progress, the run_zero_quant.sh script throws RuntimeError: Error building extension 'utils' (please see the notebook for full logs).

To Reproduce
Steps to reproduce the behavior:

  1. Run this notebook: https://gist.github.com/josephrocca/9ec65e8e5804286a475b5b6da85f7a28

Expected behavior
There is a related issue here:

The apparent solution there was to ensure that the deepspeed wheel was built with the same cuda version as the machine has installed. But the ds_report shows that the versions match. So I guess the "expected behavior" here is that it shouldn't throw the error that I'm seeing.

ds_report output
As seen in the above-linked notebook:

--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
fused_lamb ............. [NO] ....... [OKAY]
 [WARNING]  please install triton==1.0.0 if you want to use sparse attention
sparse_attn ............ [NO] ....... [NO]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-dev package with apt
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
async_io ............... [NO] ....... [NO]
utils .................. [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/usr/lib/python3/dist-packages/torch']
torch version .................... 1.11.0
torch cuda version ............... 11.6
torch hip version ................ None
nvcc version ..................... 11.6
deepspeed install path ........... ['/home/ubuntu/.local/lib/python3.8/site-packages/deepspeed']
deepspeed info ................... 0.7.0, unknown, unknown
deepspeed wheel compiled w. ...... torch 1.11, cuda 11.6

System info (please complete the following information):

  • OS: Ubuntu 20.04
  • GPU count and types: 1x RTX 6000 (24GB)
  • Python version: 3.8
  • Any other relevant info about your setup: I used https://lambdalabs.com/ GPU cloud (using their Cloud IDE)
@josephrocca josephrocca added the bug Something isn't working label Aug 5, 2022
@mrwyattii
Copy link
Contributor

mrwyattii commented Aug 5, 2022

Hi @josephrocca, thanks for using DeepSpeed. Could you try pre-compiling and let me know the outcome? To do so:

@josephrocca
Copy link
Author

Hi @mrwyattii, I tried both the DS_BUILD_OPS option and the DS_BUILD_UTILS option on a fresh Lambda Cloud machine, and both gave errors. Please see here for the full error logs of both attempts: https://gist.github.com/josephrocca/8417c4665cbfef89ba85e439c17500da

@5hadytru
Copy link

Solution?

@tjruwase
Copy link
Contributor

I see this error message in the gist log. Can you confirm that pybind11 is installed?

image

@loadams
Copy link
Collaborator

loadams commented Aug 14, 2023

This looks to have been pybind11 related, if you are still having issues with this, please re-open.

@loadams loadams closed this as completed Aug 14, 2023
@Robert11092002
Copy link

sudo apt install python3-pybind11

@CattleHome
Copy link

windows上,我在这里https://pypi.org/project/deepspeed/#files 下载了对应的包,解压之后直接放在虚拟环境里可以成功
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants