Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing and shape incoherent with module conversion #270

Open
meetps opened this issue Mar 18, 2020 · 4 comments
Open

Indexing and shape incoherent with module conversion #270

meetps opened this issue Mar 18, 2020 · 4 comments

Comments

@meetps
Copy link

meetps commented Mar 18, 2020

Minimum Reproducible Example

import torch
import torch2trt

USE_TRT = True


class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv = torch.nn.Conv2d(32, 4, kernel_size=1)

    def forward(self, inp):
        out = self.conv(inp)
        return out


def main():
    if USE_TRT:
        x = torch.ones((1, 32, 16, 16)).cuda().half()
        model = Model().half().cuda().eval()
        torch2trt.torch2trt(
            model, [x], max_batch_size=1, input_names=["input"], output_names=["output"], fp16_mode=True
        )

    # Test masking.
    mask = torch.tensor([True, False], device="cuda")
    data = torch.zeros([2, 5], device="cuda").half()
    masked_data = data[mask]
    try:
        # The following assert fails with TRT, works without TRT.
        assert list(masked_data.shape) == [1, 5]
    except:
        print("========Mask Error=========")
        print("masked_data.shape =", list(masked_data.shape))
        print("ExpectedShape = [1, 5]")

    # Test indexing.
    idx = torch.tensor([1, 2], device="cuda")
    data = torch.zeros(10, device="cuda")
    try:
        # The following indexing fails with TRT, works without TRT.
        data[idx]
    except Exception as e:
        print("========Indexing Error=========")
        print(e)


if __name__ == "__main__":
    main()

This throws an error of

========Mask Error=========
masked_data.shape = [0, 2, 5]
ExpectedShape = [1, 5]
========Indexing Error=========
too many indices for tensor of dimension 1

Environment

PyTorch version: 1.4.0
CUDA used to build PyTorch: 10.1

OS: Ubuntu 18.04.4 LTS
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
CMake version: version 3.10.2

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce RTX 2080 Ti
GPU 1: Quadro P2000

Nvidia driver version: 430.50

@yutec-nvidia
Copy link

Hi,

It looks like once we call import torch2trt ; torch2trt.torch2trt(....);
the torch.Tensor.getitem will be overriden by torch2trt getitem method right at the time.
This issue leads to the fact that pure torch.tensor method which originally uses torch.Tensor.getitem such as the data[mask] expands one dimension due to torch2trt getitem method https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/converters/getitem.py#L32.

One workaround is that when you want to get the tensor in the way of masking, e.g. data[torch.tensor([False, True], device="cuda")], always add one True dimension in the argument ( data[torch.tensor([True, False, True], device="cuda")] )and then get the first element of data (data = data[mask][0])

Another workaround is that we can prepare the data[mask] before torch2trt call. So that at the time you call torch.Tensor.getitem , it has not been overridden by torch2trt yet.

@chebbyChefNEQ
Copy link

chebbyChefNEQ commented May 10, 2020

@salexspb I think this is the bug we are seeing

We ran into the same issue. I added debug prints in __enter__ and __exit__ of ConversionHook . It seems like torch.Tensor.__getitem__ was correctly overridden back to the original method. However, we are still seeing behavior from torch2trt getitem

printout code:

    def __enter__(self):
        try:
            self.method_impl = eval(self.method_str)
        except AttributeError:
            self.method_impl = None

        if self.method_impl:
            wrapped = attach_converter(self.ctx, self.method_impl, self.converter, self.method_str)
            print("__enter__(): setting {} to {}, it was {}".format(self.method_str, wrapped, eval(self.method_str)))
            self._set_method(wrapped)

    def __exit__(self, type, val, tb):
        if self.method_impl:
            print("__exit__(): setting {} to {}".format(self.method_str, self.method_impl))
            self._set_method(self.method_impl)

printout __getitem__ only:

__enter__(): setting torch.Tensor.__getitem__ to <function attach_converter.<locals>.wrapper at 0x7f86ec4c7b70>, it was <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>

__exit__(): setting torch.Tensor.__getitem__ to <slot wrapper '__getitem__' of 'torch._C._TensorBase' objects>

My repro:

import torch
import torch2trt
import tensorrt as trt
import traceback

# works
tensor = torch.rand((7,7))
tensor[tensor != tensor]

print("tensor indexing before conversionCtx passed")

logger = trt.Logger(trt.Logger.INFO)
builder = trt.Builder(logger)
network = builder.create_network()

conversionCtx = torch2trt.ConversionContext(network)
with conversionCtx as cCtx:
    pass

# Fails
tensor = torch.rand((7,7))
tensor[tensor != tensor]

I also added import pdb; pdb.set_trace() in https://github.com/NVIDIA-AI-IOT/torch2trt/blob/master/torch2trt/converters/getitem.py#L32 which doesn't seem to stop the program at all.

@eriche2016
Copy link

I have the same issue here. Do you solve this problem?

@chaoz-dev
Copy link
Contributor

This issue should be addressed by #738.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants