Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Set allocation as loop in fusion segmentor #3880

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Priya2698
Copy link
Collaborator

No description provided.

@Priya2698
Copy link
Collaborator Author

!test

Copy link

Description

  • Added loop allocation for sharded tensors

  • Updated sharded tensor check logic

  • Commented out unnecessary pass in pre-segmenter


Changes walkthrough 📝

Relevant files
Enhancement
fusion_segmenter.cpp
Added loop allocation for sharded tensors                               

csrc/fusion_segmenter.cpp

  • Added setAllocationAsLoopForShardedTvs method to finalize
  • Implemented logic to set allocation domain as loop domain for sharded
    tensors
  • +17/-0   
    utils.cpp
    Updated sharded tensor check logic                                             

    csrc/multidevice/utils.cpp

  • Updated isSharded function to check both allocation and loop domains
  • Added error handling for multiple sharded axes
  • +23/-12 
    fusion_segmenter.h
    Added method declaration                                                                 

    csrc/fusion_segmenter.h

    • Added declaration for setAllocationAsLoopForShardedTvs method
    +3/-0     
    Miscellaneous
    pre_segmenter.cpp
    Commented out unnecessary pass                                                     

    csrc/preseg_passes/pre_segmenter.cpp

  • Commented out MakeReshardingContiguousPass as it may be unnecessary
  • +1/-1     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Performance Impact

    The new function setAllocationAsLoopForShardedTvs modifies the allocation domain of sharded tensors. It is important to evaluate the performance impact of this change and ensure that it meets the expected performance goals.

    void SegmentedFusion::setAllocationAsLoopForShardedTvs() {
      auto set_allocation_as_loop = [](std::vector<Val*> vals) {
        auto tvs = ir_utils::filterByType<TensorView>(vals);
        std::for_each(tvs.begin(), tvs.end(), [](TensorView* tv) {
          if (isSharded(tv)) {
              tv->setAllocationDomain(tv->getLoopDomain(), true);
            }
          });
      };
    
      for (auto group : groups()) {
        set_allocation_as_loop(group->inputs());
        set_allocation_as_loop(group->outputs());
      }
    }
    Logic Verification

    The logic for determining if a TensorView is sharded has been updated. It is crucial to verify that this new logic correctly identifies sharded tensors and does not introduce any false positives or negatives.

    bool isSharded(const TensorView* tv) {
      // First check allocation domain if available, or the logical domain.
      auto num_sharded_axes = std::count_if(
          tv->getMaybeAllocationDomain().begin(),
          tv->getMaybeAllocationDomain().end(),
          [](IterDomain* id) { return id->isDeviceDim(); });
    
      if (num_sharded_axes == 1) {
        return true;
      }
    
      // Check if only the loop domain is sharded.
      // It is possible if the allocation domain has not been set yet.
      if (num_sharded_axes == 0) {
        num_sharded_axes = std::count_if(
          tv->getLoopDomain().begin(),
          tv->getLoopDomain().end(),
          [](IterDomain* id) { return id->isDeviceDim(); });
      }
    
      NVF_ERROR(
          num_sharded_axes <= 1,
          "Multiple IterDomains parallelized on DIDx in TensorView ",
          tv);
    
      return num_sharded_axes == 1;
    Commented Code

    The line // OptimizationPass<MakeReshardingContiguousPass>::runPass(fusion); is commented out. It is important to understand why this pass is not being run and whether it is necessary for the overall functionality or performance of the code.

    // OptimizationPass<MakeReshardingContiguousPass>::runPass(fusion);

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    1 participant