Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accuracy drops a lot in fp16 mode #879

Open
itachi1232gg opened this issue Aug 22, 2023 · 4 comments
Open

accuracy drops a lot in fp16 mode #879

itachi1232gg opened this issue Aug 22, 2023 · 4 comments

Comments

@itachi1232gg
Copy link

itachi1232gg commented Aug 22, 2023

my model's accuracy drops a lot when I convert it into fp16 mode, even a pretrained resnet34 experienced accuracy drop in fp16 mode.

import os  
os.environ['CUDA_MODULE_LOADING'] = 'LAZY'    
from torchvision.models.resnet import resnet34, resnet18    
data = torch.ones((1,3,224,224)).cuda()    
model = resnet18(pretrained=True)    
model.eval()     
model.cuda()     
model_trt = torch2trt(model, [data], fp16_mode=True)     
with torch.no_grad():     
    print((model(data)-model_trt(data)).abs().sum())      

tensor(5.3151, device='cuda:0')

if I set fp16_mode=False
then the output is

tensor(0.0017, device='cuda:0')

@Lu-tju
Copy link

Lu-tju commented Oct 8, 2023

Hi, the result in my computer is 0.6877 with your code when set fp16_mode=False (and 5.3 when fp16_mode=True). I don't know is this error normal?

@itachi1232gg
Copy link
Author

Hi, the result in my computer is 0.6877 with your code when set fp16_mode=False (and 5.3 when fp16_mode=True). I don't know is this error normal?

0.6877 is likely to give you the wrong outputs.

@JWLee89
Copy link

JWLee89 commented Nov 15, 2023

Instead of testing the absolute sum of differences between the two models, I believe an element-wise difference check to see if all the element-wise differences do not exceed a certain threshold might be a more accurate measure.

For example (did not test the code), we can check whether all elements inside of source and target tensor are within a certain absolute threshold.

output_pt = model(data)
output_trt = model_trt(data)

# Set this value to something that seems appropriate to you
# 1e-5 is generally reasonable
import numpy as np
absolute_tolerance = 1e-5
np.allclose(output_pt, output_trt, atol=absolute_tolerance)

@itachi1232gg
Copy link
Author

Instead of testing the absolute sum of differences between the two models, I believe an element-wise difference check to see if all the element-wise differences do not exceed a certain threshold might be a more accurate measure.

For example (did not test the code), we can check whether all elements inside of source and target tensor are within a certain absolute threshold.

output_pt = model(data)
output_trt = model_trt(data)

# Set this value to something that seems appropriate to you
# 1e-5 is generally reasonable
import numpy as np
absolute_tolerance = 1e-5
np.allclose(output_pt, output_trt, atol=absolute_tolerance)

np.allclose(output_pt, output_trt, atol=absolute_tolerance) returns False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants