Skip to content

Conversation

Hitbee-dev
Copy link
Contributor

[Feature] Add NVIDIA Blackwell Support & Modernize TensorRT Deployment

Hi @lyuwenyu

It's great to connect with you again! Following my previous contributions in PRs #395 and #480, I'm thrilled to have another opportunity to contribute. Your RT-DETR repository has been an incredible foundation for my work, and it's inspired me to explore its potential on the very latest hardware.

This pull request provides critical updates to enable RT-DETR deployment on NVIDIA's new Blackwell architecture. The existing deployment scripts and environment were incompatible with the newer drivers and libraries required for Blackwell, failing at both the ONNX-to-TRT conversion and the final inference stages. This PR addresses these issues comprehensively, delivering a modernized, robust, and future-proof deployment workflow.


Key Changes by File

  1. Dockerfile & requirements.txt:

    • Switched Base Image to nvcr.io/nvidia/pytorch:25.06-py3 to provide the necessary CUDA toolkit and library support for the Blackwell architecture.
    • Optimized Dependencies to align with the new base image.
  2. references/deploy/rtdetrv2_tensorrt.py:

    • Major Refactoring to resolve API and logic incompatibilities with the newer TensorRT libraries. The new implementation is cleaner, more modular, and uses a robust "lazy" buffer allocation strategy.
    • Enhanced Standalone Testability with command-line arguments (--engine, --image, etc.) and a built-in visualization function to save inference results as an image file.
  3. tools/onnx2trt.sh (New Addition):

    • A new, flexible trtexec wrapper script that takes the ONNX file path as an argument, replacing hardcoded paths and standardizing the conversion process.
  4. docker-compose.yml:

    • Aligned with the new Dockerfile to simplify local development setup.
  5. README.md:

    • Completely restructured the usage examples into a clear, step-by-step workflow: Environment Setup -> Training -> Exporting -> Inference. This significantly improves usability for new users and provides explicit instructions for using Docker.

Extensive Testing & Validation

This entire workflow has been extensively tested and validated across the following NVIDIA GPU architectures to ensure broad, stable compatibility:

  • Ampere: RTX 3070, RTX 3090, A6000
  • Ada Lovelace: RTX 4080 Super, 6000 Ada, L40S
  • Blackwell: RTX 5070

Sample Test Run on Blackwell (RTX 5070)

image

The updated rtdetrv2_tensorrt.py script provides the following output, demonstrating a successful end-to-end test run:

root@93d0e80b0b44:/workspace/references/deploy# python rtdetrv2_tensorrt.py --engine ./rtdetr_pretrained.trt --image ./test1.jpg
--- TRTInference Wrapper Test ---

Initializing TRTInference...
[07/14/2025-16:34:11] [TRT] [I] Loaded engine size: 88 MiB
...
[TRTInference] Initialized successfully. Engine: './rtdetr_pretrained.trt'.
Preprocessing input image...
Original image size: 640x543
Input tensor shape: torch.Size([1, 3, 640, 640])
Running inference...
[TRTInference] First inference call detected. Allocating GPU buffers...
...
[TRTInference] GPU buffers allocated successfully.
Inference complete in 33.43 ms.
Post-processing and saving output image...
Found 21 objects above threshold 0.5.
Output image with detections saved to: /workspace/references/deploy/output.jpg
--- Test finished successfully ---

Thank you again for maintaining this fantastic project. I hope this contribution helps the community leverage the power of RT-DETR on the next generation of hardware!

Copy link
Owner

@lyuwenyu lyuwenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@lyuwenyu lyuwenyu merged commit 9a9f9b8 into lyuwenyu:main Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants