Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm EP] Fix transpose helpler via removing default trivial constructor #82

Merged
merged 1 commit into from
Jan 29, 2025

Conversation

TedThemistokleous
Copy link

Description

Remove inline default transposeHelper and ensure we use the proper cjeck via CanUse_hipBlasTransposeHelper_MLFloat16

Motivation and Context

Required as some gfx targets require gridsize for transpse be under the 65535 limit otherwise we'll error out.

Lipamdhip.so will error out in newer ROCm to warn about this but in previous cases we would get undefined behavior if gridsize was larger than anticipated.

…eck via CanUse_hipBlasTransposeHelper_MLFloat16
@TedThemistokleous TedThemistokleous self-assigned this Jan 29, 2025
@TedThemistokleous TedThemistokleous merged commit bb933d4 into rocm6.4_internal_testing Jan 29, 2025
5 of 15 checks passed
@TedThemistokleous TedThemistokleous deleted the fix_transpose_helper branch January 29, 2025 02:36
tianleiwu pushed a commit to microsoft/onnxruntime that referenced this pull request Jan 29, 2025
Remove inline default transposeHelper and ensure we use the proper check
via CanUse_hipBlasTransposeHelper_MLFloat16

Related to change in ROCm Onnxruntime repo:
ROCm#82

### Description

Required to correctly limit grid size of transpose helper kernel

### Motivation and Context
Compile was defaulting to the inline constructor that was removed
instead of using the overloaded case with proper checks.
Removed the inline default "true" case as this is incorrect for newer
AMD cards/targets

Co-authored-by: Ted Themistokleous <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant