Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Pytorch] Nvidia-DLFramework-Inspect support #1441

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pggPL
Copy link
Collaborator

@pggPL pggPL commented Jan 30, 2025

Description

Nvidia-DLFramework-Inspect will be the common debug/logging API for Nvidia frameworks. Integration to the Transformer Engine has 3 aims:

  • allow to disable/enable FP8 in the particular GEMMs, run current scaling in some GEMMs etc.
  • allow to easily log the statistics for each of the tensor in every GEMM,
  • make testing new precision/recipes integrated with the TE easier.

Link to the nvidia-dlframework-inspect. IMPORTANT To run this PR one need to use branch from that PR.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring
  • Testing

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 8f6dbd5 to f940ba3 Compare January 30, 2025 21:31
@ptrendx
Copy link
Member

ptrendx commented Feb 7, 2025

Please move this PR to be against main.

@pggPL pggPL changed the base branch from release_v2.0 to main February 7, 2025 23:16
@pggPL pggPL marked this pull request as ready for review February 10, 2025 11:46
@pggPL
Copy link
Collaborator Author

pggPL commented Feb 12, 2025

/te-ci pytorch

@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 7380ee1 to 7467f1e Compare February 12, 2025 17:09
* TE 2.0 code drop

Signed-off-by: Przemek Tredak <[email protected]>

* [PyTorch] Fix linter warnings (NVIDIA#1426)

* Fix linter warnings in basic linear op

Signed-off-by: Tim Moon <[email protected]>

* Fix linter warnings in grouped linear module

Signed-off-by: Tim Moon <[email protected]>

* Disable Userbuffers support in te.Sequential

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>

* Add path to disable cudnn norm for mxfp8 (NVIDIA#1432)

* Add path to disable cudnn norm for mxfp8

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Pad MXFP8 scale inverses at the time of creation (NVIDIA#1431)

* Create scale_inv for block scaling already padded

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* fix

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Remove old file, fix CG test

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fixes

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Change default value of env

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* [PyTorch] Respect existing quantizer usages in functional linear API (NVIDIA#1440)

Respect existing quantizer usages in functional linear API

Signed-off-by: Tim Moon <[email protected]>

* Nvidia-DLFramework-Inspect support

* Update FE from 1.10-rc to 1.10 (NVIDIA#1438)

Update FE 1.10-rc to 1.10

Signed-off-by: Charlene Yang <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <[email protected]>

* removed unnecesssary files

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <[email protected]>

* removed unnecesssary files

Signed-off-by: Pawel Gadzinski <[email protected]>

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <[email protected]>

* lint fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

Signed-off-by: Pawel Gadzinski <[email protected]>

* license fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [PyTorch] Debug NeMo distributed optimizer (NVIDIA#1444)

Debug errors with NeMo distributed optimizer

Avoid internal quantized tensor class in params and when setting data attr. Debug view function in MXFP8Tensor.

Signed-off-by: Tim Moon <[email protected]>

* Rename block scaling recipe (NVIDIA#1442)

Rename MXFP8 recipe

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* [common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions (NVIDIA#1437)

* Generalized MXFP8 fused kernels w.r.t. input tensor dimensions

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/common/common.cu

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Removed unnecessary test scenarios

Signed-off-by: Oleg Goncharov <[email protected]>

* Reverted the previous commit as it generated a compilation error (caused by to string conversion)

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update transformer_engine/common/common.cu

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update test_cast_mxfp8.cu

Signed-off-by: Oleg Goncharov <[email protected]>

* Fixed the bug with partial dbias writes in trimmed chunks

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Generalized MXFP8 dequantize kernel

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Oleg Goncharov <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>

* Add the virtual destructor to the Quantizer class (NVIDIA#1446)

Add the virtual destructor to the Quantizer

Signed-off-by: Przemek Tredak <[email protected]>

* [Core] Debug unaligned MXFP8 dequantize tests (NVIDIA#1450)

* Skip MXFP8 dequantize tests with invalid alignment

Signed-off-by: Tim Moon <[email protected]>

* Remove test case with unaligned rows

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>

* Generalization of the FP8 dgated activations kernel (NVIDIA#1448)

* Relax FP8 gated activations requirements
Expanded MXFP8 and FP8 tests coverage

Signed-off-by: Przemek Tredak <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix scale_inv check in test

Signed-off-by: Przemek Tredak <[email protected]>

* Update tests/cpp/operator/test_cast_mxfp8.cu

Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Przemyslaw Tredak <[email protected]>

* Changes from review

Signed-off-by: Przemek Tredak <[email protected]>

* Lift the 2D restriction on MXFP8 scales

Signed-off-by: Przemek Tredak <[email protected]>

* Fix the scale_inv dimension check for MXFP8

Signed-off-by: Przemek Tredak <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Skip columnwise MXFP8 tests for 1D tensors

Signed-off-by: Przemek Tredak <[email protected]>

* Skip 2x MXFP8 tests with 1D tensors

Signed-off-by: Przemek Tredak <[email protected]>

* Adjusting tolerances for dbias

Signed-off-by: Przemek Tredak <[email protected]>

* Smaller test cases

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Przemyslaw Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <[email protected]>

* one test api fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* [PyTorch/C++] Comm+GEMM overlap compatibility with QuantizedTensor (NVIDIA#1427)

* C++ code and TE/PyTorch general_gemm updated to support TP overlap with cppqtensor

Signed-off-by: Alp Dener <[email protected]>

CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors

Signed-off-by: Alp Dener <[email protected]>

updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor

Signed-off-by: Alp Dener <[email protected]>

te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8.

Signed-off-by: Alp Dener <[email protected]>

completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization

Signed-off-by: Alp Dener <[email protected]>

all overlaps working with bf16, all but bulk WGRAD working with FP8

Signed-off-by: Alp Dener <[email protected]>

all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules)

Signed-off-by: Alp Dener <[email protected]>

all overlaps working with QuantizedTensor in BF16 and FP8

Signed-off-by: Alp Dener <[email protected]>

cleaned up pytest formatting

Signed-off-by: Alp Dener <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* removed atomic GEMM tests for comm+GEMM overlap (deprecated in CUDA) and updated test sizing

Signed-off-by: Alp Dener <[email protected]>

* all TP overlap tests fixed on H100, a few failures remain in sanity tests

Signed-off-by: Alp Dener <[email protected]>

* Minor fix, lint, format

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fix mxfp8

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Minor changes/cleanup

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Populate column-wise data in FP8 LayerNorm/RMSNorm funcs if provided

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix linter warnings

Signed-off-by: Tim Moon <[email protected]>

* Fix fused attn tests

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Initialize LN output with correct device

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fix UB distributed tests

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* Fix for non-fp8 cases

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

---------

Signed-off-by: Alp Dener <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: Tim Moon <[email protected]>

* [PyTorch] Remove MXFP8 scale-inv padding in MXFP8 all-gather (NVIDIA#1455)

* Remove MXFP8 scale-inv padding in MXFP8 all-gather

Signed-off-by: Tim Moon <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Zero out padding in MXFP8 scale-inverses

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [common] Generalized MXFP8 gated kernels w.r.t. input tensor dimensions (NVIDIA#1449)

* Fixed scaling tensor alignment/padding

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changes from review

Signed-off-by: Przemek Tredak <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed alignment and padding in scaled tensors. Refactoring.

Signed-off-by: Oleg Goncharov <[email protected]>

* Skipped scenarios for non-mod(32) tensors

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixes

Signed-off-by: Przemek Tredak <[email protected]>

* More fixes

Signed-off-by: Przemek Tredak <[email protected]>

* Some fixes to the CPU reference

Signed-off-by: Przemek Tredak <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fixed typo in the kernel. Restricted the last dim to multiples of 32

Signed-off-by: Oleg Goncharov <[email protected]>

* Fixed TMA writes overlap

Signed-off-by: Oleg Goncharov <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove the largest test cases for numerical stability

Signed-off-by: Przemek Tredak <[email protected]>

---------

Signed-off-by: Oleg Goncharov <[email protected]>
Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Przemek Tredak <[email protected]>
Co-authored-by: Tim Moon <[email protected]>

* Fix MXFP8 normalization (NVIDIA#1457)

* Fix MXFP8 normalization

Signed-off-by: Przemek Tredak <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: Przemek Tredak <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* [PyTorch] Reduce tensor dimensions in MXFP8 tests (NVIDIA#1435)

* Relax dim constraint in MXFP8 tests

Dims are multiples of 32 instead of 128.

Signed-off-by: Tim Moon <[email protected]>

* Make tensor dims multiples of 32

Signed-off-by: Tim Moon <[email protected]>

* Avoid MXFP8 GEMM with MXFP8 output

Signed-off-by: Tim Moon <[email protected]>

* Reduce tensor sizes in non-quantized TP test

Signed-off-by: Tim Moon <[email protected]>

* Increase GEMM sizes in distributed te.Sequential tests

Signed-off-by: Tim Moon <[email protected]>

---------

Signed-off-by: Tim Moon <[email protected]>

* Expand sanity tests to include MXFP8

Signed-off-by: Kirthi Shankar Sivamani <[email protected]>

* polishing

Signed-off-by: Pawel Gadzinski <[email protected]>

* polishing

Signed-off-by: Pawel Gadzinski <[email protected]>

* polishing

Signed-off-by: Pawel Gadzinski <[email protected]>

* polishing

Signed-off-by: Pawel Gadzinski <[email protected]>

* refactor

Signed-off-by: Pawel Gadzinski <[email protected]>

* refactor

Signed-off-by: Pawel Gadzinski <[email protected]>

* lint fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* lint fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint fixed

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* lint and license fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* nvinspect_api to debug_api

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* end debug

Signed-off-by: Pawel Gadzinski <[email protected]>

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* one gpu tests passing

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes all tests

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixes

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* lint fix

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* new small test

Signed-off-by: Pawel Gadzinski <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix

Signed-off-by: Pawel Gadzinski <[email protected]>

---------

Signed-off-by: Przemek Tredak <[email protected]>
Signed-off-by: Tim Moon <[email protected]>
Signed-off-by: Kirthi Shankar Sivamani <[email protected]>
Signed-off-by: Charlene Yang <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>
Signed-off-by: Oleg Goncharov <[email protected]>
Signed-off-by: Przemyslaw Tredak <[email protected]>
Signed-off-by: Alp Dener <[email protected]>
Co-authored-by: Przemek Tredak <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Co-authored-by: Kirthi Shankar Sivamani <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Charlene Yang <[email protected]>
Co-authored-by: Oleg Goncharov <[email protected]>
Co-authored-by: Alp Dener <[email protected]>
Co-authored-by: Tim Moon <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
@pggPL pggPL force-pushed the nvdlfw_inspect_support branch from 7467f1e to c90f5ac Compare February 12, 2025 17:09
pggPL and others added 3 commits February 12, 2025 09:21
Signed-off-by: Pawel Gadzinski <[email protected]>
Signed-off-by: Pawel Gadzinski <[email protected]>
@pggPL
Copy link
Collaborator Author

pggPL commented Feb 12, 2025

/te-ci pytorch L1

@timmoon10 timmoon10 self-requested a review February 13, 2025 19:37
docs/debug.rst Outdated
==============================================

.. toctree::
:caption: Debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make the title more descriptive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Signed-off-by: Pawel Gadzinski <[email protected]>
if os.environ.get("DEBUG", False):
# The numerics of all the layers should work the same,
# when debug=True. I fed them with dummy feature
# to prevent switching off debug, what can happend if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# to prevent switching off debug, what can happend if
# to prevent switching off debug, which can happen if

@@ -37,6 +37,7 @@
def _run_test(quantization):
test_path = TEST_ROOT / "run_numerics.py"
test_cmd = LAUNCH_CMD + [str(test_path)]
print(" ".join(test_cmd))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left from debug?

if os.environ.get("DEBUG", False):
# The numerics of all the layers should work the same,
# when debug=True. I fed them with dummy feature
# to prevent switching off debug, what can happend if
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# to prevent switching off debug, what can happend if
# to prevent switching off debug, which can happen if

except ImportError as e:
pass

from .pytorch.debug_state import set_weight_tensor_tp_group_reduce
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be inside that try?

#
# See LICENSE for license information.

"""Kernels written with OpenAI Triton."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is accurate :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants