[BugFix] Fix ray import error mem cleanup bug #21381

joerunde · 2025-07-22T14:26:39Z

This is a duplicate of #21149 since @tjohnson31415 is now OOO for a while

Purpose

Fix a bug that prevents an LLM() instance from being garbage collected after deletion. This could result in unexpected memory usage or an EADDRINUSE error when attempting to create a second LLM() in the same process.

Test Plan

This is a simple script to reproduce the problem:

import gc
import weakref

from vllm import LLM

llm = LLM(model="ibm-granite/granite-3.3-2b-instruct", max_model_len=32000, max_num_seqs=2, tensor_parallel_size=2, enforce_eager=True)

# a weakref does not prevent garbage collection
weakllm = weakref.ref(llm)

print("BEFORE refcount", len(gc.get_referrers(weakllm())))
del llm
gc.collect()

if weakllm() is not None:
  print("AFTER refcount", len(gc.get_referrers(weakllm())))
  print("NOT GARBAGE COLLECTED!")
  print(gc.get_referrers[0])

If Ray is installed, then I get BEFORE refcount 1 and the weakllm loses its referent. If Ray is not installed, I instead get:

BEFORE refcount 2
NOT GARBAGE COLLECTED!
AFTER refcount 1
[<frame at 0x7fb6861f0040, file '/tmp/home/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/llm.py', line 276, code __init__>]

Test Result

After applying the fix from this PR, the refcount is only 1 even with ray uninstalled, so the llm resources get garbage collected as desired.

Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>

Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>

gemini-code-assist

Code Review

This pull request effectively resolves a memory leak caused by holding a reference to an ImportError exception object, which in turn held a reference to a stack frame, preventing garbage collection. The fix, which involves storing the string representation of the exception instead of the object itself, is a clean and correct approach to break this reference cycle.

The necessary adjustments to the raise statements that use this error information have been handled correctly by removing the from clause and incorporating the error message into the new exception's text. The changes are well-contained and directly address the reported bug. Overall, this is a solid improvement to the codebase's stability.

njhill

Thanks @joerunde @tjohnson31415!

github-actions · 2025-07-22T14:34:48Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

njhill · 2025-07-22T17:10:17Z

@joerunde could you rebase 🙏