Skip to content

CUDA Error: Invalid Argument when annotating large images #18

@faberno

Description

@faberno

Hey,

I ran into the same problem as described in this issue in the Slicer Plugin Repo. Except other than them I am not running nninteractive in a docker container, but locally.

The problem
For large images the following error is thrown when annotating the image.

File ~/PycharmProjects/napari-nninteractive/nnInteractive/nnInteractive/inference/inference_session.py:140, in nnInteractiveInferenceSession._initialize_interactions(self=<nnInteractive.inference.inference_session.nnInteractiveInferenceSession object>, image_torch=tensor([[[[ 0.5962,  0.3944,  0.4676,  ..., -0.3...72,  0.9848,  ..., -0.3240, -0.3240, -0.3240]]]]))
    138     print(f'Initialize interactions. Pinned: {self.use_pinned_memory}')
    139 # Create the interaction tensor based on the target shape.
--> 140 self.interactions = torch.zeros(
        self.interactions = None
        self = <nnInteractive.inference.inference_session.nnInteractiveInferenceSession object at 0x7afebd4543e0>
        image_torch = tensor([[[[ 0.5962,  0.3944,  0.4676,  ..., -0.3240, -0.3240, -0.3240],
          [ 0.9328,  0.7784,  0.5041,  ..., -0.3240, -0.3240, -0.3240],
          [ 1.1058,  1.0088,  0.6511,  ..., -0.3240, -0.3240, -0.3240],
          ...,
         [prints the whole matrix]
          ...,
          [ 0.7830,  1.0863,  1.1597,  ..., -0.3240, -0.3240, -0.3240],
          [ 0.7449,  1.0398,  1.1931,  ..., -0.3240, -0.3240, -0.3240],
          [ 0.6084,  0.8872,  0.9848,  ..., -0.3240, -0.3240, -0.3240]]]])
        torch.float16 = torch.float16
        self.use_pinned_memory = True
        self.device = device(type='cuda', index=0)    141     (7, *image_torch.shape[1:]),
    142     device='cpu',
    143     dtype=torch.float16,
    144     pin_memory=(self.device.type == 'cuda' and self.use_pinned_memory)
    145 )

AcceleratorError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I tried it for many different image sizes and the problem only occurs for images larger than around 600MB.
But it can not be a matter of too little memory. I'm using the 5090 with 32GB of VRAM and my system has 120GB of RAM. It also ran without any problems on our other machines with way less RAM and VRAM.

Do you guys have any ideas what the problem might be? Could it be the CUDA version (as you recommend 12.6, while I need 12.8 for the 5090)?

Environment Information
Operating System: Ubuntu 24.04
CUDA Version: 12.8
Python Version: 3.12
GPU: 5090
Memory: 120GB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions