[AMD] [Model] DeepSeek tunings #13199

rasmith · 2025-02-13T04:06:07Z

This PR adds some tunings to improve ROCm performance for DeepSeek.

latency command: VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON=1 VLLM_FP8_PADDING=0 python3 benchmarks/benchmark_latency.py --model "deepseek-ai/DeepSeek-V3" -tp 8 --trust-remote-code --max-model-len 32768 --load-format "dummy" --input-len 128 --output-len 32 --batch-size 32 --num-iters 5 --num-iters-warmup 2

Before:
PPL=3.090357290962561
Avg latency: 3.0797035929979755 seconds

After:
PPL=3.0888253522260
Avg latency: 2.5471135329920798 seconds

Signed-off-by: Randall Smith <[email protected]>

github-actions · 2025-02-13T04:06:18Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Randall Smith <[email protected]>

mgoin

What is the difference between MI300XHF_OAM and MI300X_OAM? Also it seems in currently existing configs "_OAM" doesn't exist for MI300X, so why is it included now?

mgoin · 2025-02-13T17:39:01Z

...E=256,N=256,device_name=AMD_Instinct_MI300XHF_OAM,dtype=fp8_w8a8,block_shape=[128, 128].json

This filename still has the space

Fixed
Still have OAM in it, and XHF too.
vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300XHF_OAM,dtype=fp8_w8a8,block_shape=[128,128].json

Fixed
Still have OAM in it, and XHF too.
vllm/model_executor/layers/fused_moe/configs/E=256,N=256,device_name=AMD_Instinct_MI300XHF_OAM,dtype=fp8_w8a8,block_shape=[128,128].json

They don't have Divakar's changes yet.

mgoin · 2025-02-13T17:40:27Z

benchmarks/kernels/benchmark_moe.py

        save_configs(best_configs, E, shard_intermediate_size, hidden_size,
-                     topk, dtype, use_fp8_w8a8, use_int8_w8a16)
+                     topk, dtype, use_fp8_w8a8, use_int8_w8a16,
+                     block_quant_shape)


It looks like you only save the block_quant_shape in the config but don't consider it for the kernel tuning - I would think this is important for the tuner to have?

This was added in by mistake. I meant to just add the tunings that worked for us.

Signed-off-by: Randall Smith <[email protected]>

rasmith · 2025-02-13T20:40:02Z

What is the difference between MI300XHF_OAM and MI300X_OAM? Also it seems in currently existing configs "_OAM" doesn't exist for MI300X, so why is it included now?

Turns out we don't need the XHF files, so I removed those.

Signed-off-by: Randall Smith <[email protected]>

DeepSeek performance improvements

0571d04

Signed-off-by: Randall Smith <[email protected]>

rasmith requested review from mgoin, robertgshaw2-redhat and tlrmchlsmth as code owners February 13, 2025 04:06

rasmith added 2 commits February 13, 2025 04:11

revert fp8_utils.py

974e422

Signed-off-by: Randall Smith <[email protected]>

yapf

defd28a

Signed-off-by: Randall Smith <[email protected]>

rasmith changed the title ~~[AMD] [Model] DeepSeek performance improvements~~ [AMD] [Model] DeepSeek tunings Feb 13, 2025

hongxiayang added the rocm label Feb 13, 2025

rasmith added 2 commits February 13, 2025 17:32

remove space from file names

4944224

Signed-off-by: Randall Smith <[email protected]>

remove white space from file names

34e6b83

Signed-off-by: Randall Smith <[email protected]>

mgoin reviewed Feb 13, 2025

View reviewed changes

rasmith added 3 commits February 13, 2025 19:24

remove XHF tunings

c7abb8f

Signed-off-by: Randall Smith <[email protected]>

Remove space from config

1b56906

Signed-off-by: Randall Smith <[email protected]>

Remove changes to benchmark_moe.py

af0a44d

Signed-off-by: Randall Smith <[email protected]>

hongxiayang added ready ONLY add when PR is ready to merge/full CI is needed and removed ready ONLY add when PR is ready to merge/full CI is needed labels Feb 14, 2025

remove moe tuning

8051004

Signed-off-by: Randall Smith <[email protected]>

mgoin added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 14, 2025

mgoin approved these changes Feb 14, 2025

View reviewed changes

move tunings

4fe4478

Signed-off-by: Randall Smith <[email protected]>

DarkLight1337 enabled auto-merge (squash) February 15, 2025 11:58

simon-mo merged commit ed0de3e into vllm-project:main Feb 15, 2025
42 of 47 checks passed

Sakalya pushed a commit to Sakalya/vllm that referenced this pull request Feb 15, 2025

[AMD] [Model] DeepSeek tunings (vllm-project#13199)

8f10a96

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] [Model] DeepSeek tunings #13199

[AMD] [Model] DeepSeek tunings #13199

rasmith commented Feb 13, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 13, 2025

mgoin left a comment

mgoin Feb 13, 2025

rasmith Feb 13, 2025

hongxiayang Feb 13, 2025

rasmith Feb 13, 2025 •

edited

Loading

mgoin Feb 13, 2025

rasmith Feb 13, 2025

rasmith commented Feb 13, 2025

[AMD] [Model] DeepSeek tunings #13199

[AMD] [Model] DeepSeek tunings #13199

Conversation

rasmith commented Feb 13, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 13, 2025

mgoin left a comment

Choose a reason for hiding this comment

mgoin Feb 13, 2025

Choose a reason for hiding this comment

rasmith Feb 13, 2025

Choose a reason for hiding this comment

hongxiayang Feb 13, 2025

Choose a reason for hiding this comment

rasmith Feb 13, 2025 • edited Loading

Choose a reason for hiding this comment

mgoin Feb 13, 2025

Choose a reason for hiding this comment

rasmith Feb 13, 2025

Choose a reason for hiding this comment

rasmith commented Feb 13, 2025

rasmith commented Feb 13, 2025 •

edited by github-actions bot

Loading

rasmith Feb 13, 2025 •

edited

Loading