Fastsafetensors #298

fialhocoelho · 2025-01-20T15:33:04Z

This PR adds support for using fastsafetensors.

Implementation considerations

Dockerfile.ubi:
- The numactl-devel package is required for installing fastsafetensors. However, it is not available in the repository (only numactl-libs is). To address this, we compile numa (numactl, numactl-devel, and numactl-libs) and install it in the python-cuda-base layer immediately before the installation of the packages in requirements-cuda.txt (where fastsafetensors is listed for installation)
- In the vllm-openai layer, we add the LD_LIBRARY_PATH environment variable to include the path for the nvidia-cufile library, which is required for using fastsafetensors. Normally, cufile is installed alongside cuda-toolkit (version >12) or nvidia-GDS. Even though we are not using GDS, the binaries compiled by fastsafetensors require the cufile library to be loaded
- In the final layer (vllm-grpc-adapter), we install numactl and numactl-libs, which were compiled in the python-cuda-base layer, to ensure the fastsafetensors library works properly with the same version used in the compilation step
- Lastly, we install the nvidia-cufile package, which was already linked in the previous step/layer. While it may not be "aesthetically pleasing" to install the library alongside the RUN uv pip install command that installs the adapter, I also don't find it very functional to create a requirements-fastsafetensors.txt file or something similar, considering this is the only library besides the numa dependencies
vllm/model_executor/model_loader/weight_utils.py:
- The SafeTensorsFileLoader() function initializes fastsafetensors and includes the nogds parameter, which defaults to False. According to the docs, when set to True, it disables support for nvidia-gds. Since we do not use nvidia-gds, I set it to True.
- The documentation also mentions that if nogds is False and gds is unavailable, the loader will fail. However, during testing, the function worked with nogds set to both True and False. It might be necessary to conduct more tests/benchmarks to evaluate the functionality and performance impact of using this library.

Signed-off-by: Jefferson Fialho <[email protected]>

openshift-ci · 2025-01-20T15:33:16Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fialhocoelho
Once this PR has been reviewed and has the lgtm label, please assign terrytangyuan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2025-01-20T19:54:46Z

@fialhocoelho: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/smoke-test	`a9c6aa0`	link	true	`/test smoke-test`
ci/prow/images	`a9c6aa0`	link	true	`/test images`
ci/prow/rocm-pr-image-mirror	`a9c6aa0`	link	true	`/test rocm-pr-image-mirror`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

maxdebayser · 2025-01-24T14:57:10Z

vllm/model_executor/model_loader/loader.py

@@ -307,7 +308,15 @@ def _get_weights_iterator(
                hf_weights_files,
            )
        elif use_safetensors:
-            weights_iterator = safetensors_weights_iterator(hf_weights_files)
+            use_fastsafe_tensor = os.getenv('USE_FASTSAFETENSOR',


My suggestion here would be to enable loader extra config for the DefaultLoader: https://github.com/vllm-project/vllm/blob/e784c6b9984e8f8116f74000b863d941495acb0b/vllm/model_executor/model_loader/loader.py#L190

So that you can enable fastsafetensors and GDS with
--model-loader-extra-config='{"use_fastsafetensors": true, "enable_gds: true"}'

dtrifiro · 2025-01-31T09:55:55Z

Dockerfile.ubi

 RUN --mount=type=cache,target=/root/.cache/pip \
    --mount=type=cache,target=/root/.cache/uv \
    --mount=type=bind,from=build,src=/workspace/dist,target=/workspace/dist \
-    HOME=/root uv pip install "$(echo /workspace/dist/*.whl)[tensorizer]" vllm-tgis-adapter==0.6.0
+    HOME=/root uv pip install "$(echo /workspace/dist/*.whl)[tensorizer]" vllm-tgis-adapter==0.6.0 nvidia-cufile-cu12


Looking at the fastsafetensors repo's pyproject.toml, nvidia-cufile-cu12 is not listed as a dependency. Should we open a PR there to add this as a dep instead of manually adding it here?

The cufile lib isn’t listed as a direct installation requirement, but it’s necessary because when importing the main object, it relies on a piece of code (fastcpp) that depends on cufile. The key point is that cufile is required for using the library with GDS support, which isn’t our use case. However, due to how the lib is structured, we still need this dependency even if we’re not using GDS. According to the docs, setting nogds=True disables GDS support, but the cufile requirement is triggered immediately upon import. I believe the best course of action is to open an issue in the repo to address this condition (I can take care of that).

Sorry @fialhocoelho, I missed this reply.

Thanks for the explanation. Are we sure we're not interested in GDS though? I don't know much about it, but it seems it's one of the performance improvements that fastsafetensors brings

@dtrifiro I don’t have much experience with how GDS would be installed in an OpenShift env, since it operates at a low level. I’ve only used it in a DGX environment. I can look into it a bit.

Looking at https://docs.nvidia.com/gpudirect-storage/overview-guide/index.html#software-architecture it doesn't seem like there's much needed apart from cufile and the nvidia driver

dtrifiro · 2025-01-31T09:58:40Z

docs/source/serving/weights_loading_with_fastsafetensor.rst

+===================================================================
+
+Using fastsafetensor library enables loading model weights to GPU memory by leveraging GPU direct storage. See https://github.com/foundation-model-stack/fastsafetensors for more details.
+For enabling this feature, set the environment variable ``USE_FASTSAFETENSOR`` to ``true``


Is this still relevant?

fialhocoelho added 3 commits January 20, 2025 11:11

Squash 10647

bdbdfb0

Signed-off-by: Jefferson Fialho <[email protected]>

add numactl compil/install and nvidia-cufile for fastsafetensors

7f81733

Signed-off-by: Jefferson Fialho <[email protected]>

Use nogds=True to DISABLE the NVIDIA GDS support for fastsafetensors

a9c6aa0

Signed-off-by: Jefferson Fialho <[email protected]>

fialhocoelho requested review from maxdebayser, joerunde, tjohnson31415 and dtrifiro January 20, 2025 15:33

fialhocoelho self-assigned this Jan 20, 2025

openshift-ci bot requested a review from vaibhavjainwiz January 20, 2025 15:33

maxdebayser reviewed Jan 24, 2025

View reviewed changes

dtrifiro reviewed Jan 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastsafetensors #298

Fastsafetensors #298

fialhocoelho commented Jan 20, 2025

openshift-ci bot commented Jan 20, 2025

openshift-ci bot commented Jan 20, 2025

maxdebayser Jan 24, 2025

dtrifiro Jan 31, 2025 •

edited

Loading

fialhocoelho Jan 31, 2025

dtrifiro Feb 5, 2025

fialhocoelho Feb 6, 2025

dtrifiro Feb 6, 2025

dtrifiro Jan 31, 2025

Fastsafetensors #298

Are you sure you want to change the base?

Fastsafetensors #298

Conversation

fialhocoelho commented Jan 20, 2025

openshift-ci bot commented Jan 20, 2025

openshift-ci bot commented Jan 20, 2025

maxdebayser Jan 24, 2025

Choose a reason for hiding this comment

dtrifiro Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

fialhocoelho Jan 31, 2025

Choose a reason for hiding this comment

dtrifiro Feb 5, 2025

Choose a reason for hiding this comment

fialhocoelho Feb 6, 2025

Choose a reason for hiding this comment

dtrifiro Feb 6, 2025

Choose a reason for hiding this comment

dtrifiro Jan 31, 2025

Choose a reason for hiding this comment

dtrifiro Jan 31, 2025 •

edited

Loading