Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use default 128 cores per sm instead of report error #3884

Merged
merged 3 commits into from
Feb 17, 2025

Conversation

liqiangxl
Copy link
Collaborator

use default 128 cores per sm instead of report error

@liqiangxl liqiangxl requested a review from jjsjann123 February 13, 2025 22:17
@liqiangxl
Copy link
Collaborator Author

!build

Copy link

github-actions bot commented Feb 13, 2025

Review updated until commit 4eb748c

Description

  • Updated CUDA cores per SM map to include new architecture.

  • Set default cores to 128 for unknown architectures.

  • Added support for architecture 0xc0.


Changes walkthrough 📝

Relevant files
Enhancement
utils.cpp
Updated CUDA cores map and default handling                           

csrc/scheduler/utils.cpp

  • Updated cores_per_sm_map to include architecture 0xc0.
  • Changed error handling to return default 128 cores for unknown
    architectures.
  • +6/-20   

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Default Value

    The PR introduces a default value of 128 cores per SM for unknown architectures. This could lead to incorrect performance estimates or inefficiencies for unsupported GPUs.

    // Use the default value of 128 for any architecture not listed,
    // applicable to all current Blackwell GPUs.
    return 128;

    @liqiangxl
    Copy link
    Collaborator Author

    !build

    @@ -2845,7 +2845,8 @@ int getCoresPerSM(int major, int minor) {
    if (it != cores_per_sm_map.end()) {
    return it->second;
    }
    NVF_THROW("Unknown GPU architecture: ", major, ".", minor);
    // Use the default value of 128 for any architecture not listed,
    Copy link
    Collaborator

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Could you also add a {0xc0, 128}}; line for Blackwell?

    Copy link
    Collaborator Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    added

    @liqiangxl
    Copy link
    Collaborator Author

    !test

    Copy link
    Collaborator

    @jjsjann123 jjsjann123 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    @liqiangxl liqiangxl merged commit 2e755b7 into main Feb 17, 2025
    54 checks passed
    @liqiangxl liqiangxl deleted the llu/fix_unknown_arch branch February 17, 2025 01:02
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    3 participants