Skip to content

cudaDeviceSynchronize used in SDK filters requires all CUDA streams to complete #12680

@m-mead

Description

@m-mead

Required Info
Camera Model D400
Firmware Version N/a
Operating System & Version Linux, Windows
Kernel Version (Linux Only) All
Platform All
SDK Version 2.54.2
Language C and C++
Segment

Issue Description

The Realsense SDK uses cudaDeviceSynchronize to synchronize GPU operations. This takes place in the color conversion functions and alignment filter. The issue with using cudaDeviceSynchronize is that it will wait for all operations on all streams to complete: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#explicit-synchronization. From my understanding of the code, it isn't necessary for the Realsense SDK to wait on all streams to complete -- but rather just the one on which the filtering operations are executing. Please correct me if I am wrong. 🙂

The user may be running CUDA code in separate CUDA streams in their application and the cudaDeviceSynchronize call will wait for those operations to finish if they are executing concurrently. A solution to this problem would be to either place CUDA operations in the Realsense SDK on a separate stream, or use cudaStreamSynchronize with an argument of 0 to only synchronize the default stream which is used by the Realsense SDK. Either solution would allow SDK CUDA operations to not block until other streams complete. The latter is simpler to implement and would not change the stream users expect the Realsense SDK to use.

I am happy to help contribute the changes if the Realsense team is interested; I searched for similar issues and could not find related issues.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions