Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GUIDELLM__MAX_CONCURRENCY is off by 1 #70

Open
sjmonson opened this issue Feb 11, 2025 · 4 comments · May be fixed by #72
Open

GUIDELLM__MAX_CONCURRENCY is off by 1 #70

sjmonson opened this issue Feb 11, 2025 · 4 comments · May be fixed by #72
Assignees

Comments

@sjmonson
Copy link

The number of concurrent requests in throughput mode is always one less then Settings.max_concurrency.

Example

export GUIDELLM__MAX_CONCURRENCY="2"

guidellm --target http://localhost:8000/v1 \
         --model meta-llama/Llama-3.2-3B \
         --data-type emulated \
         --data prompt_tokens=512,generated_tokens=2048 \
         --rate-type throughput \
         --max-seconds 300

Observe from server side that number of requests in queue is 1

@rgreenberg1
Copy link
Contributor

@parfeniukink is this something that you can quickly fix and push out? Thanks!

@parfeniukink parfeniukink linked a pull request Feb 17, 2025 that will close this issue
@parfeniukink
Copy link
Contributor

hey @sjmonson
Could you please checkout the branch 70-guidellm__max_concurrency-is-off-by-1 for testing?
In the old version, we had this max_concurrency config, which does not affect concurrency. It appears only once in the code, and this is for limiting tasks generation in a while loop.

I created another CLI parameter --workers in #72.
Could you please tell me whether it's something you expected to see? Otherwise, provide more details about the expected behavior.

@sjmonson
Copy link
Author

sjmonson commented Feb 19, 2025

My assumption would be that GUIDELLM__MAX_CONCURRENCY="2" limits the number of concurrent async tasks to 2. Currently in main that limit seems off by 1; this seems like a simple bug?

I'm not sure what that --workers flag in #72 is for, is it intended to be a direct replacement for Settings.max_concurrency? Atm it does not seem to do anything.

Edit: Appoligies I understand what --workers is doing now, it duplicates the main scheduling loop. I don't think this is something we want to do unless its spawned separate of the main thread (user-space threads or mp). But I don't think #72 is really related to this issue.

@parfeniukink
Copy link
Contributor

My assumption would be that GUIDELLM__MAX_CONCURRENCY="2" limits the number of concurrent async tasks to 2. Currently in main that limit seems off by 1; this seems like a simple bug?

I'm not sure what that --workers flag in #72 is for, is it intended to be a direct replacement for Settings.max_concurrency? Atm it does not seem to do anything.

Edit: Appoligies I understand what --workers is doing now, it duplicates the main scheduling loop. I don't think this is something we want to do unless its spawned separate of the main thread (user-space threads or mp). But I don't think #72 is really related to this issue.

Okay, now I understand you. Thanks for details. I'll back to you soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants