Add external vLLM server support #309

lgibelli · 2025-08-14T17:19:28Z

This PR adds support for using an external vLLM server with the pipeline, enabling significant performance improvements by keeping the model loaded between runs.

Changes

vllm_server_manager.py: New standalone script to manage a persistent vLLM server with health monitoring and automatic restart capabilities
pipeline.py modifications:
- Added --vllm-url flag to connect to external vLLM servers
- Added model verification to ensure correct model is loaded
- Added robust retry logic with exponential backoff
- Skip GPU check and model download when using external server

Benefits

Save 30-60+ seconds per pipeline run by avoiding model reloading
Run multiple pipeline instances against the same server
Better resource utilization with persistent GPU memory allocation
Automatic server restart on crashes (configurable)
Clean separation of server infrastructure from processing logic

Usage

Start the server manager

python -m olmocr.vllm_server_manager

Run pipeline with external server

python -m olmocr.pipeline workspace --vllm-url http://localhost:30024 --pdfs *.pdf

Backward Compatibility

Fully backward compatible - the pipeline works exactly as before if --vllm-url is not provided.

Testing

Tested with single and multiple PDFs
Verified server restart functionality
Confirmed model verification works correctly
Backward compatibility verified

This PR adds support for using an external vLLM server with the pipeline, enabling significant performance improvements by keeping the model loaded between runs. Changes: - Add vllm_server_manager.py: Standalone script to manage a persistent vLLM server with health monitoring and automatic restart capabilities - Add --vllm-url flag to pipeline.py: Allows connecting to an external vLLM server instead of starting one internally - Add model verification: Ensures the external server has the correct model loaded before processing - Add robust retry logic: Handles connection failures with exponential backoff and health checks Benefits: - Save 30-60+ seconds per pipeline run by avoiding model reloading - Run multiple pipeline instances against the same server - Better resource utilization with persistent GPU memory allocation - Automatic server restart on crashes with configurable retry logic - Clean separation of server infrastructure from processing logic Usage: 1. Start the server manager: python -m olmocr.vllm_server_manager 2. Run pipeline with external server: python -m olmocr.pipeline workspace --vllm-url http://localhost:30024 --pdfs *.pdf The implementation is fully backward compatible - the pipeline works exactly as before if --vllm-url is not provided.

jakep-allenai · 2025-08-15T22:51:54Z

Sounds like a good start, and something we'd like to support. I'd suggest to remove the vllm server manager, the user can just call vllm serve etc as they usually would. Then, try to refactor things to change as little as possible in the main code between the local case, and the external server case. I do like the idea of checking that the right model is loaded, but you can do that in both cases (ex keep the await server ready in both cases)

lgibelli · 2025-08-19T16:52:37Z

thanks for the feedback, will update the PR asap.

jakep-allenai · 2025-08-25T19:56:25Z

Closing as we went with @haydn-jones solution

haydn-jones mentioned this pull request Aug 20, 2025

External vLLM instance #319

Merged

jakep-allenai closed this Aug 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add external vLLM server support #309

Add external vLLM server support #309

Uh oh!

lgibelli commented Aug 14, 2025

Uh oh!

jakep-allenai commented Aug 15, 2025

Uh oh!

lgibelli commented Aug 19, 2025

Uh oh!

jakep-allenai commented Aug 25, 2025

Uh oh!

Uh oh!

Add external vLLM server support #309

Add external vLLM server support #309

Uh oh!

Conversation

lgibelli commented Aug 14, 2025

Changes

Benefits

Usage

Start the server manager

Run pipeline with external server

Backward Compatibility

Testing

Uh oh!

jakep-allenai commented Aug 15, 2025

Uh oh!

lgibelli commented Aug 19, 2025

Uh oh!

jakep-allenai commented Aug 25, 2025

Uh oh!

Uh oh!