-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pytorch] Nvidia-DLFramework-Inspect support #1441
base: main
Are you sure you want to change the base?
Conversation
8f6dbd5
to
f940ba3
Compare
Please move this PR to be against main. |
/te-ci pytorch |
7380ee1
to
7467f1e
Compare
* TE 2.0 code drop Signed-off-by: Przemek Tredak <[email protected]> * [PyTorch] Fix linter warnings (NVIDIA#1426) * Fix linter warnings in basic linear op Signed-off-by: Tim Moon <[email protected]> * Fix linter warnings in grouped linear module Signed-off-by: Tim Moon <[email protected]> * Disable Userbuffers support in te.Sequential Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> * Add path to disable cudnn norm for mxfp8 (NVIDIA#1432) * Add path to disable cudnn norm for mxfp8 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Pad MXFP8 scale inverses at the time of creation (NVIDIA#1431) * Create scale_inv for block scaling already padded Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * fix Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Remove old file, fix CG test Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fixes Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Change default value of env Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * [PyTorch] Respect existing quantizer usages in functional linear API (NVIDIA#1440) Respect existing quantizer usages in functional linear API Signed-off-by: Tim Moon <[email protected]> * Nvidia-DLFramework-Inspect support * Update FE from 1.10-rc to 1.10 (NVIDIA#1438) Update FE 1.10-rc to 1.10 Signed-off-by: Charlene Yang <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <[email protected]> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <[email protected]> * removed unnecesssary files Signed-off-by: Pawel Gadzinski <[email protected]> * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <[email protected]> * lint fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Signed-off-by: Pawel Gadzinski <[email protected]> * license fix Signed-off-by: Pawel Gadzinski <[email protected]> * [PyTorch] Debug NeMo distributed optimizer (NVIDIA#1444) Debug errors with NeMo distributed optimizer Avoid internal quantized tensor class in params and when setting data attr. Debug view function in MXFP8Tensor. Signed-off-by: Tim Moon <[email protected]> * Rename block scaling recipe (NVIDIA#1442) Rename MXFP8 recipe Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * [common] Generalized MXFP8 fused kernels w.r.t. input tensor dimensions (NVIDIA#1437) * Generalized MXFP8 fused kernels w.r.t. input tensor dimensions Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Removed unnecessary test scenarios Signed-off-by: Oleg Goncharov <[email protected]> * Reverted the previous commit as it generated a compilation error (caused by to string conversion) Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update transformer_engine/common/common.cu Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update test_cast_mxfp8.cu Signed-off-by: Oleg Goncharov <[email protected]> * Fixed the bug with partial dbias writes in trimmed chunks Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Generalized MXFP8 dequantize kernel Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Oleg Goncharov <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]> * Add the virtual destructor to the Quantizer class (NVIDIA#1446) Add the virtual destructor to the Quantizer Signed-off-by: Przemek Tredak <[email protected]> * [Core] Debug unaligned MXFP8 dequantize tests (NVIDIA#1450) * Skip MXFP8 dequantize tests with invalid alignment Signed-off-by: Tim Moon <[email protected]> * Remove test case with unaligned rows Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> * Generalization of the FP8 dgated activations kernel (NVIDIA#1448) * Relax FP8 gated activations requirements Expanded MXFP8 and FP8 tests coverage Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix scale_inv check in test Signed-off-by: Przemek Tredak <[email protected]> * Update tests/cpp/operator/test_cast_mxfp8.cu Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> * Changes from review Signed-off-by: Przemek Tredak <[email protected]> * Lift the 2D restriction on MXFP8 scales Signed-off-by: Przemek Tredak <[email protected]> * Fix the scale_inv dimension check for MXFP8 Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Skip columnwise MXFP8 tests for 1D tensors Signed-off-by: Przemek Tredak <[email protected]> * Skip 2x MXFP8 tests with 1D tensors Signed-off-by: Przemek Tredak <[email protected]> * Adjusting tolerances for dbias Signed-off-by: Przemek Tredak <[email protected]> * Smaller test cases Signed-off-by: Przemek Tredak <[email protected]> --------- Signed-off-by: Przemek Tredak <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <[email protected]> * one test api fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [PyTorch/C++] Comm+GEMM overlap compatibility with QuantizedTensor (NVIDIA#1427) * C++ code and TE/PyTorch general_gemm updated to support TP overlap with cppqtensor Signed-off-by: Alp Dener <[email protected]> CommOverlap objects can now return overlap buffers to PyTorch as QuantizedTensors Signed-off-by: Alp Dener <[email protected]> updated comm+GEMM overlap test for pure GEMM, both BF16 and FP8 working with QuantizedTensor Signed-off-by: Alp Dener <[email protected]> te.Linear and te.LayerNormMLP updated for TP overlap w/ QuantizedTensor. All overlaps work in BF16. All ovrlaps except bulk WGRAD work in FP8. Signed-off-by: Alp Dener <[email protected]> completed TP overlap QuantizedTensor updates for LayerNormLinear, but issues with quantized normalization Signed-off-by: Alp Dener <[email protected]> all overlaps working with bf16, all but bulk WGRAD working with FP8 Signed-off-by: Alp Dener <[email protected]> all overlaps work with Float8Tensor, except bulk wgrad in LayerNormMLP (works in other modules) Signed-off-by: Alp Dener <[email protected]> all overlaps working with QuantizedTensor in BF16 and FP8 Signed-off-by: Alp Dener <[email protected]> cleaned up pytest formatting Signed-off-by: Alp Dener <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * removed atomic GEMM tests for comm+GEMM overlap (deprecated in CUDA) and updated test sizing Signed-off-by: Alp Dener <[email protected]> * all TP overlap tests fixed on H100, a few failures remain in sanity tests Signed-off-by: Alp Dener <[email protected]> * Minor fix, lint, format Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix mxfp8 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Minor changes/cleanup Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Populate column-wise data in FP8 LayerNorm/RMSNorm funcs if provided Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix linter warnings Signed-off-by: Tim Moon <[email protected]> * Fix fused attn tests Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Initialize LN output with correct device Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix UB distributed tests Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * Fix for non-fp8 cases Signed-off-by: Kirthi Shankar Sivamani <[email protected]> --------- Signed-off-by: Alp Dener <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: Tim Moon <[email protected]> * [PyTorch] Remove MXFP8 scale-inv padding in MXFP8 all-gather (NVIDIA#1455) * Remove MXFP8 scale-inv padding in MXFP8 all-gather Signed-off-by: Tim Moon <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Zero out padding in MXFP8 scale-inverses Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [common] Generalized MXFP8 gated kernels w.r.t. input tensor dimensions (NVIDIA#1449) * Fixed scaling tensor alignment/padding Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Changes from review Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed alignment and padding in scaled tensors. Refactoring. Signed-off-by: Oleg Goncharov <[email protected]> * Skipped scenarios for non-mod(32) tensors Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixes Signed-off-by: Przemek Tredak <[email protected]> * More fixes Signed-off-by: Przemek Tredak <[email protected]> * Some fixes to the CPU reference Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fixed typo in the kernel. Restricted the last dim to multiples of 32 Signed-off-by: Oleg Goncharov <[email protected]> * Fixed TMA writes overlap Signed-off-by: Oleg Goncharov <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove the largest test cases for numerical stability Signed-off-by: Przemek Tredak <[email protected]> --------- Signed-off-by: Oleg Goncharov <[email protected]> Signed-off-by: Przemek Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Przemek Tredak <[email protected]> Co-authored-by: Tim Moon <[email protected]> * Fix MXFP8 normalization (NVIDIA#1457) * Fix MXFP8 normalization Signed-off-by: Przemek Tredak <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Przemek Tredak <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * [PyTorch] Reduce tensor dimensions in MXFP8 tests (NVIDIA#1435) * Relax dim constraint in MXFP8 tests Dims are multiples of 32 instead of 128. Signed-off-by: Tim Moon <[email protected]> * Make tensor dims multiples of 32 Signed-off-by: Tim Moon <[email protected]> * Avoid MXFP8 GEMM with MXFP8 output Signed-off-by: Tim Moon <[email protected]> * Reduce tensor sizes in non-quantized TP test Signed-off-by: Tim Moon <[email protected]> * Increase GEMM sizes in distributed te.Sequential tests Signed-off-by: Tim Moon <[email protected]> --------- Signed-off-by: Tim Moon <[email protected]> * Expand sanity tests to include MXFP8 Signed-off-by: Kirthi Shankar Sivamani <[email protected]> * polishing Signed-off-by: Pawel Gadzinski <[email protected]> * polishing Signed-off-by: Pawel Gadzinski <[email protected]> * polishing Signed-off-by: Pawel Gadzinski <[email protected]> * polishing Signed-off-by: Pawel Gadzinski <[email protected]> * refactor Signed-off-by: Pawel Gadzinski <[email protected]> * refactor Signed-off-by: Pawel Gadzinski <[email protected]> * lint fixes Signed-off-by: Pawel Gadzinski <[email protected]> * lint fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fixed Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint and license fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * nvinspect_api to debug_api Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * end debug Signed-off-by: Pawel Gadzinski <[email protected]> * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * one gpu tests passing Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes all tests Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fixes Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * fix Signed-off-by: Pawel Gadzinski <[email protected]> * lint fix Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * new small test Signed-off-by: Pawel Gadzinski <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix Signed-off-by: Pawel Gadzinski <[email protected]> --------- Signed-off-by: Przemek Tredak <[email protected]> Signed-off-by: Tim Moon <[email protected]> Signed-off-by: Kirthi Shankar Sivamani <[email protected]> Signed-off-by: Charlene Yang <[email protected]> Signed-off-by: Pawel Gadzinski <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> Signed-off-by: Oleg Goncharov <[email protected]> Signed-off-by: Przemyslaw Tredak <[email protected]> Signed-off-by: Alp Dener <[email protected]> Co-authored-by: Przemek Tredak <[email protected]> Co-authored-by: Tim Moon <[email protected]> Co-authored-by: Kirthi Shankar Sivamani <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Charlene Yang <[email protected]> Co-authored-by: Oleg Goncharov <[email protected]> Co-authored-by: Alp Dener <[email protected]> Co-authored-by: Tim Moon <[email protected]> Signed-off-by: Pawel Gadzinski <[email protected]>
7467f1e
to
c90f5ac
Compare
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
/te-ci pytorch L1 |
Signed-off-by: Pawel Gadzinski <[email protected]>
for more information, see https://pre-commit.ci
Signed-off-by: Pawel Gadzinski <[email protected]>
docs/debug.rst
Outdated
============================================== | ||
|
||
.. toctree:: | ||
:caption: Debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the title more descriptive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Pawel Gadzinski <[email protected]>
if os.environ.get("DEBUG", False): | ||
# The numerics of all the layers should work the same, | ||
# when debug=True. I fed them with dummy feature | ||
# to prevent switching off debug, what can happend if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# to prevent switching off debug, what can happend if | |
# to prevent switching off debug, which can happen if |
@@ -37,6 +37,7 @@ | |||
def _run_test(quantization): | |||
test_path = TEST_ROOT / "run_numerics.py" | |||
test_cmd = LAUNCH_CMD + [str(test_path)] | |||
print(" ".join(test_cmd)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left from debug?
if os.environ.get("DEBUG", False): | ||
# The numerics of all the layers should work the same, | ||
# when debug=True. I fed them with dummy feature | ||
# to prevent switching off debug, what can happend if |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# to prevent switching off debug, what can happend if | |
# to prevent switching off debug, which can happen if |
except ImportError as e: | ||
pass | ||
|
||
from .pytorch.debug_state import set_weight_tensor_tp_group_reduce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be inside that try?
# | ||
# See LICENSE for license information. | ||
|
||
"""Kernels written with OpenAI Triton.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is accurate :-)
Description
Nvidia-DLFramework-Inspect will be the common debug/logging API for Nvidia frameworks. Integration to the Transformer Engine has 3 aims:
Link to the nvidia-dlframework-inspect. IMPORTANT To run this PR one need to use branch from that PR.
Type of change
Checklist: