✨ feat(deploy): Modernize TRT deployment for Blackwell with docs #620

Hitbee-dev · 2025-07-14T18:02:08Z

[Feature] Add NVIDIA Blackwell Support & Modernize TensorRT Deployment

It's great to connect with you again! Following my previous contributions in PRs #395 and #480, I'm thrilled to have another opportunity to contribute. Your RT-DETR repository has been an incredible foundation for my work, and it's inspired me to explore its potential on the very latest hardware.

This pull request provides critical updates to enable RT-DETR deployment on NVIDIA's new Blackwell architecture. The existing deployment scripts and environment were incompatible with the newer drivers and libraries required for Blackwell, failing at both the ONNX-to-TRT conversion and the final inference stages. This PR addresses these issues comprehensively, delivering a modernized, robust, and future-proof deployment workflow.

Key Changes by File

Dockerfile & requirements.txt:
- Switched Base Image to nvcr.io/nvidia/pytorch:25.06-py3 to provide the necessary CUDA toolkit and library support for the Blackwell architecture.
- Optimized Dependencies to align with the new base image.
references/deploy/rtdetrv2_tensorrt.py:
- Major Refactoring to resolve API and logic incompatibilities with the newer TensorRT libraries. The new implementation is cleaner, more modular, and uses a robust "lazy" buffer allocation strategy.
- Enhanced Standalone Testability with command-line arguments (--engine, --image, etc.) and a built-in visualization function to save inference results as an image file.
tools/onnx2trt.sh (New Addition):
- A new, flexible trtexec wrapper script that takes the ONNX file path as an argument, replacing hardcoded paths and standardizing the conversion process.
docker-compose.yml:
- Aligned with the new Dockerfile to simplify local development setup.
README.md:
- Completely restructured the usage examples into a clear, step-by-step workflow: Environment Setup -> Training -> Exporting -> Inference. This significantly improves usability for new users and provides explicit instructions for using Docker.

Extensive Testing & Validation

This entire workflow has been extensively tested and validated across the following NVIDIA GPU architectures to ensure broad, stable compatibility:

✅ Ampere: RTX 3070, RTX 3090, A6000
✅ Ada Lovelace: RTX 4080 Super, 6000 Ada, L40S
✅ Blackwell: RTX 5070

Sample Test Run on Blackwell (RTX 5070)

The updated rtdetrv2_tensorrt.py script provides the following output, demonstrating a successful end-to-end test run:

root@93d0e80b0b44:/workspace/references/deploy# python rtdetrv2_tensorrt.py --engine ./rtdetr_pretrained.trt --image ./test1.jpg
--- TRTInference Wrapper Test ---

Initializing TRTInference...
[07/14/2025-16:34:11] [TRT] [I] Loaded engine size: 88 MiB
...
[TRTInference] Initialized successfully. Engine: './rtdetr_pretrained.trt'.
Preprocessing input image...
Original image size: 640x543
Input tensor shape: torch.Size([1, 3, 640, 640])
Running inference...
[TRTInference] First inference call detected. Allocating GPU buffers...
...
[TRTInference] GPU buffers allocated successfully.
Inference complete in 33.43 ms.
Post-processing and saving output image...
Found 21 objects above threshold 0.5.
Output image with detections saved to: /workspace/references/deploy/output.jpg
--- Test finished successfully ---

Thank you again for maintaining this fantastic project. I hope this contribution helps the community leverage the power of RT-DETR on the next generation of hardware!

lyuwenyu

lgtm

Hitbee-dev added 2 commits July 15, 2025 02:29

✨ feat(deploy): Modernize TRT deployment for Blackwell with docs

0763a4b

📝 Update README.md

1a2083b

lyuwenyu approved these changes Jul 21, 2025

View reviewed changes

lyuwenyu merged commit 9a9f9b8 into lyuwenyu:main Jul 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

✨ feat(deploy): Modernize TRT deployment for Blackwell with docs #620

✨ feat(deploy): Modernize TRT deployment for Blackwell with docs #620

Uh oh!

Hitbee-dev commented Jul 14, 2025

Uh oh!

lyuwenyu left a comment

Uh oh!

Uh oh!

Uh oh!

✨ feat(deploy): Modernize TRT deployment for Blackwell with docs #620

✨ feat(deploy): Modernize TRT deployment for Blackwell with docs #620

Uh oh!

Conversation

Hitbee-dev commented Jul 14, 2025

[Feature] Add NVIDIA Blackwell Support & Modernize TensorRT Deployment

Key Changes by File

Extensive Testing & Validation

Sample Test Run on Blackwell (RTX 5070)

Uh oh!

lyuwenyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!