fixing dtype promotion in where #1734

jjsjann123 · 2025-02-01T04:52:16Z

nvfuser type promotion rule for where is different from thunder. We could accidentally create output tensor in double as noted in #1723 (comment).

thunder/executors/nvfuserex_impl.py

beverlylytle

Nice! thanks!

thunder/tests/opinfos.py

kshitij12345

IIUC, Current fix works because there is another where later with tensor inputs which correctly casts the output later.

I think we should also handle the following case (which still fails).

Fusion with only prims.where(cond, 1., 0.) symbol

import thunder
import thunder.examine
import torch

def fn(c, x, y):
    return torch.where(c, x, y)

x = 1.
y = 0.

print(fn(torch.rand(3, 3, device='cuda') > 0.5, x, y).dtype)  # torch.float

jfn = thunder.jit(fn, nv_store_fusion_inputs=True)
print(jfn(torch.rand(3, 3, device='cuda') > 0.5, x, y).dtype)  # torch.double

# trc = thunder.last_traces(jfn)[-1]

# print(trc)
# fusion_syms = thunder.examine.get_fusion_symbols(trc)

# bsym = fusion_syms[0]

# repro = thunder.examine.get_nvfuser_repro(trc, bsym.sym.name)
# print(repro)

mruberry · 2025-02-03T16:29:42Z

IIUC, Current fix works because there is another where later with tensor inputs which correctly casts the output later.

I think we should also handle the following case (which still fails).

Fusion with only prims.where(cond, 1., 0.) symbol

import thunder
import thunder.examine
import torch

def fn(c, x, y):
    return torch.where(c, x, y)

x = 1.
y = 0.

print(fn(torch.rand(3, 3, device='cuda') > 0.5, x, y).dtype)  # torch.float

jfn = thunder.jit(fn, nv_store_fusion_inputs=True)
print(jfn(torch.rand(3, 3, device='cuda') > 0.5, x, y).dtype)  # torch.double

# trc = thunder.last_traces(jfn)[-1]

# print(trc)
# fusion_syms = thunder.examine.get_fusion_symbols(trc)

# bsym = fusion_syms[0]

# repro = thunder.examine.get_nvfuser_repro(trc, bsym.sym.name)
# print(repro)

Interesting! I'm surprised this case still fails with this fix. Maybe we need more tests cases for nvfuser and where, too

jjsjann123 · 2025-02-03T16:56:02Z

Thx for the repro.

I see what the issue is now. nvfuser executor maps dtypes differently for number types. so I just need another condition to convert weak to strong type when output is a tensor proxy.

jjsjann123 · 2025-02-03T16:59:24Z

but I'm surprised this is not part of opinfo tests. I'll add that.

…verification

jjsjann123 · 2025-02-03T17:23:02Z

@kshitij12345 verified the fix on your repro. Added tests in opinfo as well.

Thanks a lot for pointing this out. 🙇

mruberry · 2025-02-03T19:39:34Z

@jjsjann123 Looks like the CI issues are related

thunder/executors/nvfuserex_impl.py

thunder/tests/opinfos.py

kshitij12345

LGTM, thank you @jjsjann123!

jjsjann123 · 2025-02-04T16:40:08Z

code is good for review again. cc'ing @mruberry

jjsjann123 · 2025-02-04T16:40:44Z

@jjsjann123 Looks like the CI issues are related

CI issue was because we try to run grad on the newly added scalar inputs, which doesn't have grad function.

mruberry

Cool! Nice comments

fixing dtype promotion in where

c660257

jjsjann123 mentioned this pull request Feb 1, 2025

div with nvfuser returns incorrect dtype #1723

Closed

errr python

ffb10ef

jjsjann123 commented Feb 1, 2025

View reviewed changes

thunder/executors/nvfuserex_impl.py Show resolved Hide resolved

jjsjann123 marked this pull request as ready for review February 1, 2025 04:59

jjsjann123 requested review from mruberry, lantiga and t-vi as code owners February 1, 2025 04:59

jjsjann123 requested review from tfogal, kshitij12345 and beverlylytle February 1, 2025 05:00

beverlylytle approved these changes Feb 3, 2025

View reviewed changes

thunder/tests/opinfos.py Outdated Show resolved Hide resolved

kshitij12345 requested changes Feb 3, 2025

View reviewed changes

jjsjann123 added 2 commits February 3, 2025 09:11

fixing dtype promotion for scalar outputs in where; adding tests for …

df8c364

…verification

typo

b63b620

jjsjann123 requested a review from kshitij12345 February 3, 2025 17:12

Merge branch 'main' into fix_1723

f5a413d

mruberry reviewed Feb 3, 2025

View reviewed changes

thunder/executors/nvfuserex_impl.py Show resolved Hide resolved

mruberry reviewed Feb 3, 2025

View reviewed changes

thunder/tests/opinfos.py Outdated Show resolved Hide resolved

mruberry reviewed Feb 3, 2025

View reviewed changes

thunder/tests/opinfos.py Outdated Show resolved Hide resolved

jjsjann123 added 2 commits February 3, 2025 13:19

moving new test to the right place.

b78523f

updating tests

8a88018

jjsjann123 mentioned this pull request Feb 3, 2025

supporting float32 on Scalar values NVIDIA/Fuser#3816

Open

jjsjann123 added 2 commits February 3, 2025 14:15

adding link to nvfuser issue

2698e5c

fixing grad test for where

8bc26ff

kshitij12345 approved these changes Feb 4, 2025

View reviewed changes

jjsjann123 enabled auto-merge (squash) February 4, 2025 16:39

jjsjann123 requested a review from mruberry February 4, 2025 16:39

Merge branch 'main' into fix_1723

66f11b0

mruberry approved these changes Feb 4, 2025

View reviewed changes

jjsjann123 merged commit 40f7972 into main Feb 4, 2025
49 checks passed

jjsjann123 deleted the fix_1723 branch February 4, 2025 17:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixing dtype promotion in where #1734

fixing dtype promotion in where #1734

jjsjann123 commented Feb 1, 2025 •

edited

Loading

beverlylytle left a comment

kshitij12345 left a comment

mruberry commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

mruberry commented Feb 3, 2025

kshitij12345 left a comment

jjsjann123 commented Feb 4, 2025

jjsjann123 commented Feb 4, 2025

mruberry left a comment

fixing dtype promotion in where #1734

fixing dtype promotion in where #1734

Conversation

jjsjann123 commented Feb 1, 2025 • edited Loading

beverlylytle left a comment

Choose a reason for hiding this comment

kshitij12345 left a comment

Choose a reason for hiding this comment

mruberry commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

jjsjann123 commented Feb 3, 2025

mruberry commented Feb 3, 2025

kshitij12345 left a comment

Choose a reason for hiding this comment

jjsjann123 commented Feb 4, 2025

jjsjann123 commented Feb 4, 2025

mruberry left a comment

Choose a reason for hiding this comment

jjsjann123 commented Feb 1, 2025 •

edited

Loading