-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect behavior for out-of-range casts to FP8E4M3FN #260
Comments
I guess a related question is, is the saturation option exposed to the user in any way? see details at https://onnx.ai/onnx/technical/float8.html#cast |
Actually there seems to be an inconsistency in the saturation behavior. The switch happens between 465 and 464:
|
I think this is working as expected: if we compare the requested value to the spacing between representable values, we're rounding down within the first half of the spacing and rounding up in the second half. Consider this: import numpy as np
import ml_dtypes
import warnings
warnings.simplefilter('ignore', RuntimeWarning)
def check(typ):
name = typ.__name__
print(f'type: {name}')
largest = typ(ml_dtypes.finfo(typ).max)
print(f' largest value: {largest}')
prev = np.nextafter(largest, typ(0))
print(f' previous value: {prev}')
spacing = np.spacing(prev)
print(f' spacing: {spacing}')
val1 = largest + spacing // 2 - 1
val2 = val1 + 1
val3 = val1 + 2
print(f' {name}({val1}) = {typ(val1)}')
print(f' {name}({val2}) = {typ(val2)}')
print(f' {name}({val3}) = {typ(val3)}')
check(np.float16)
check(ml_dtypes.float8_e5m2)
check(ml_dtypes.float8_e4m3fn)
Keep in mind that Comparing the two, there is perhaps some kind of off-by-one error that I don't totally understand (in |
@jakevdp The off-by-one might have to do with rounding to even and truncation to target mantissa bit width. It needs some checking. |
Ah, sorry I misunderstood you – so your aim here is that overflowing values for dtypes with no @hawkinsp – what do you think about this? |
I'm not sure. The natural thing to do would be to expose a custom Although I'm not sure what behavior should |
Yea as long as both options are exposed it should suffice. As regards which should be the default for |
The following calculation returns NaN but this is inconsistent with the behavior in ONNXRUNTIME which returns the max representable value of 448.
The text was updated successfully, but these errors were encountered: