Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Int8 pipeline parallelism #1482

Open
psinger opened this issue Jan 22, 2025 · 0 comments
Open

Int8 pipeline parallelism #1482

psinger opened this issue Jan 22, 2025 · 0 comments

Comments

@psinger
Copy link

psinger commented Jan 22, 2025

I am trying to work with cuda streams for pipeline parallelism, i.e. executing different parts of a model at the same time on different gpus.
And with int4, float16, bfloat16 everything seems to work as expected.

However, with int8 there appears to be something blocking, and gpus execute sequentially.
As int4 works, I am wondering if anyone knows if there is some blocking operation in int8.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant