Support tensors with only column-wise data #1505

timmoon10 · 2025-02-25T02:13:52Z

Description

The quantized tensor infrastructure in TE 2.0 assumes that tensors have row-wise data available, both in the core C++ library and in the PyTorch extensions. This PR relaxes that assumption to support tensors wth only column-wise data. This allows us to avoid caching unnecessary data after the linear forward pass (we only need column-wise input for wgrad GEMM) and to reduce communication volume in MXFP8 tensor-parallel all-gathers. It is not quite perfectly optimized (tensor-parallel linear module still caches BF16 input tensor instead of column-wise MXFP8 tensor), but it is a reasonable step.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Support tensors with only column-wise data in the core C++ library
Support tensors with only column-wise data in the PyTorch framework
Only cache column-wise data for input tensor in single-GPU Linear module, LayerNormLinear module, BasicLinear op
Support MXFP8 all-gather with column-wise data

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Signed-off-by: Tim Moon <[email protected]>

for more information, see https://pre-commit.ci

timmoon10 · 2025-02-25T02:17:02Z

/te-ci core pytorch L0 L1

timmoon10 added 6 commits February 21, 2025 23:57

Delete row-wise data in single-GPU linear forward

fcfd118

Signed-off-by: Tim Moon <[email protected]>

Debug Python->C++ parsing of transpose-only Float8Tensors

90002d9

Signed-off-by: Tim Moon <[email protected]>

Debug tensor shape calculation without row-wise data

f917002

Signed-off-by: Tim Moon <[email protected]>

Debug correctness issues with only column-wise data

03d95e5

Signed-off-by: Tim Moon <[email protected]>

Only cache column-wise input in LayerNormLinear

2099726

Signed-off-by: Tim Moon <[email protected]>

Support MXFP8 all-gather with only column-wise data

7f4dfdb

Signed-off-by: Tim Moon <[email protected]>

timmoon10 added enhancement New feature or request performance labels Feb 25, 2025

timmoon10 requested a review from ksivaman February 25, 2025 02:13

Merge branch 'main' into columnwise-only-tensors

04a067b

timmoon10 marked this pull request as draft February 25, 2025 02:14

[pre-commit.ci] auto fixes from pre-commit.com hooks

351aece

for more information, see https://pre-commit.ci

timmoon10 requested a review from ptrendx February 25, 2025 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support tensors with only column-wise data #1505

Support tensors with only column-wise data #1505

timmoon10 commented Feb 25, 2025

timmoon10 commented Feb 25, 2025

Support tensors with only column-wise data #1505

Are you sure you want to change the base?

Support tensors with only column-wise data #1505

Conversation

timmoon10 commented Feb 25, 2025

Description

Type of change

Changes

Checklist:

timmoon10 commented Feb 25, 2025