Helm Chart Lacks Clear Support for Multi-Node vLLM Deployment #50

shohamyamin · 2025-01-31T13:58:49Z

The current Helm chart does not explicitly support deploying vLLM across multiple vllm nodes on Kubernetes, or it's unclear how to configure it. Improved documentation or multi-node support is needed for deploying LLM that require multi-node

ahg-g · 2025-01-31T15:46:41Z

For reference, and if someone would like to add that to helm charts, the LeaderWorkerSet API can be used to deploy multi-node vllm on k8s: https://docs.vllm.ai/en/latest/deployment/frameworks/lws.html

Also check examples on the LWS repo: https://github.com/kubernetes-sigs/lws/tree/main/docs/examples/vllm

YuhanLiu11 · 2025-01-31T18:04:24Z

Thanks for submitting the issue!

Multi-node deployment should be supported. If you encounter any issues running a multi-node deployment, feel free to let us know.

We will improve the documentation to clarify how to configure multi-node deployment on Kubernetes.

ApostaC · 2025-02-09T04:16:43Z

@shohamyamin we recently did some local tests and will update the helm charts and the docs soon!

shohamyamin · 2025-02-09T05:25:03Z

@ApostaC great to hear that. Did you know if this configuration can be run on a rootless environment(like Openshift or other k8s rootless environments)?

ApostaC · 2025-02-09T05:28:10Z

Yeah, you should be able to do that.

I tried to setup a rootless k8s environment with kubeadm + 2 physical nodes recently and successfully got it run.

shohamyamin · 2025-02-09T11:13:08Z

Great I will try at the moment that the chart will be updated

ApostaC · 2025-02-11T05:11:22Z

@shohamyamin Hey, the tensor parallelism support is added in PR #105

simon-mo · 2025-02-18T20:46:15Z

On LWS, it doesn't work out of the box on vLLM container image. Please submit a PR to vLLM repo so the vLLM container has the ability to start ray cluster.

0xThresh · 2025-02-18T21:18:47Z

While this is specific to EKS, it looks like AWS has a Dockerfile that can support it for AWS here: https://aws-ia.github.io/terraform-aws-eks-blueprints/patterns/machine-learning/multi-node-vllm/#dockerfile

It also looks like there is a PR already open to help address this, so I'll keep an eye on that: vllm-project/vllm#12913

ahg-g · 2025-02-18T21:37:39Z

I think we can make the multi-node deployment generic if we have the following script as part of the vllm image: vllm-project/vllm#12913

ApostaC added the help wanted Extra attention is needed label Feb 7, 2025

ApostaC mentioned this issue Feb 9, 2025

Feat: add shm configure into helm chart to support tensor parallel #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helm Chart Lacks Clear Support for Multi-Node vLLM Deployment #50

Helm Chart Lacks Clear Support for Multi-Node vLLM Deployment #50

shohamyamin commented Jan 31, 2025

ahg-g commented Jan 31, 2025

YuhanLiu11 commented Jan 31, 2025

ApostaC commented Feb 9, 2025

shohamyamin commented Feb 9, 2025

ApostaC commented Feb 9, 2025

shohamyamin commented Feb 9, 2025

ApostaC commented Feb 11, 2025

simon-mo commented Feb 18, 2025

0xThresh commented Feb 18, 2025

ahg-g commented Feb 18, 2025

Helm Chart Lacks Clear Support for Multi-Node vLLM Deployment #50

Helm Chart Lacks Clear Support for Multi-Node vLLM Deployment #50

Comments

shohamyamin commented Jan 31, 2025

ahg-g commented Jan 31, 2025

YuhanLiu11 commented Jan 31, 2025

ApostaC commented Feb 9, 2025

shohamyamin commented Feb 9, 2025

ApostaC commented Feb 9, 2025

shohamyamin commented Feb 9, 2025

ApostaC commented Feb 11, 2025

simon-mo commented Feb 18, 2025

0xThresh commented Feb 18, 2025

ahg-g commented Feb 18, 2025