[Bugfix] Massage MLA's usage of flash attn for RoCM #13310

tlrmchlsmth · 2025-02-14T22:35:46Z

This PR massages some vllm_flash_attn vs flash_attn interface differences that appeared between a couple of PRs:

#12662 added a this fallback to flash_attn in order to support MLA on RoCM:

vllm/vllm/attention/backends/mla/utils.py

Lines 33 to 36 in 5e5c8e0

    
           try: 
        
               from vllm.vllm_flash_attn import flash_attn_varlen_func 
        
           except ImportError: 
        
               from flash_attn import flash_attn_varlen_func

Subsequently #12807 updated to use the vllm_flash_attn specific arguments that control whether we use FA2 or FA3.

Signed-off-by: Tyler Michael Smith <[email protected]>

github-actions · 2025-02-14T22:35:57Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: Tyler Michael Smith <[email protected]>

tlrmchlsmth added 2 commits February 14, 2025 22:29

Massage MLA's usage of flash attn for RoCM

31bf997

Signed-off-by: Tyler Michael Smith <[email protected]>

formatting

460783f

Signed-off-by: Tyler Michael Smith <[email protected]>

Improve

645a911

Signed-off-by: Tyler Michael Smith <[email protected]>

mgoin approved these changes Feb 14, 2025

View reviewed changes

mgoin added bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed AMD GPU labels Feb 14, 2025

tlrmchlsmth enabled auto-merge (squash) February 14, 2025 23:27

simon-mo merged commit 97a3d6d into vllm-project:main Feb 15, 2025
53 of 60 checks passed

Sakalya pushed a commit to Sakalya/vllm that referenced this pull request Feb 15, 2025

[Bugfix] Massage MLA's usage of flash attn for RoCM (vllm-project#13310)

0a4d96c

panf2333 pushed a commit to yottalabsai/vllm that referenced this pull request Feb 18, 2025

[Bugfix] Massage MLA's usage of flash attn for RoCM (vllm-project#13310)

c241a75

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Massage MLA's usage of flash attn for RoCM #13310

[Bugfix] Massage MLA's usage of flash attn for RoCM #13310

tlrmchlsmth commented Feb 14, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Feb 14, 2025

	try:
	from vllm.vllm_flash_attn import flash_attn_varlen_func
	except ImportError:
	from flash_attn import flash_attn_varlen_func

[Bugfix] Massage MLA's usage of flash attn for RoCM #13310

[Bugfix] Massage MLA's usage of flash attn for RoCM #13310

Conversation

tlrmchlsmth commented Feb 14, 2025 • edited by github-actions bot Loading

github-actions bot commented Feb 14, 2025

tlrmchlsmth commented Feb 14, 2025 •

edited by github-actions bot

Loading