Skip to content

Conversation

joerunde
Copy link
Collaborator

@joerunde joerunde commented Jul 22, 2025

This is a duplicate of #21149 since @tjohnson31415 is now OOO for a while

Purpose

Fix a bug that prevents an LLM() instance from being garbage collected after deletion. This could result in unexpected memory usage or an EADDRINUSE error when attempting to create a second LLM() in the same process.

Test Plan

This is a simple script to reproduce the problem:

import gc
import weakref

from vllm import LLM

llm = LLM(model="ibm-granite/granite-3.3-2b-instruct", max_model_len=32000, max_num_seqs=2, tensor_parallel_size=2, enforce_eager=True)

# a weakref does not prevent garbage collection
weakllm = weakref.ref(llm)

print("BEFORE refcount", len(gc.get_referrers(weakllm())))
del llm
gc.collect()

if weakllm() is not None:
  print("AFTER refcount", len(gc.get_referrers(weakllm())))
  print("NOT GARBAGE COLLECTED!")
  print(gc.get_referrers[0])

If Ray is installed, then I get BEFORE refcount 1 and the weakllm loses its referent. If Ray is not installed, I instead get:

BEFORE refcount 2
NOT GARBAGE COLLECTED!
AFTER refcount 1
[<frame at 0x7fb6861f0040, file '/tmp/home/my-vllm/lib64/python3.12/site-packages/vllm/entrypoints/llm.py', line 276, code __init__>]

Test Result

After applying the fix from this PR, the refcount is only 1 even with ray uninstalled, so the llm resources get garbage collected as desired.

tjohnson31415 and others added 2 commits July 18, 2025 09:21
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
@joerunde joerunde added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 22, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a memory leak caused by holding a reference to an ImportError exception object, which in turn held a reference to a stack frame, preventing garbage collection. The fix, which involves storing the string representation of the exception instead of the object itself, is a clean and correct approach to break this reference cycle.

The necessary adjustments to the raise statements that use this error information have been handled correctly by removing the from clause and incorporating the error message into the new exception's text. The changes are well-contained and directly address the reported bug. Overall, this is a solid improvement to the codebase's stability.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill njhill changed the title Fix ray import error bug [BugFix] Fix ray import error mem cleanup bug Jul 22, 2025
@njhill njhill enabled auto-merge (squash) July 22, 2025 14:32
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@njhill
Copy link
Member

njhill commented Jul 22, 2025

@joerunde could you rebase 🙏

@njhill njhill added this to the v0.10.0 milestone Jul 22, 2025
@njhill njhill added the bug Something isn't working label Jul 22, 2025
@simon-mo simon-mo disabled auto-merge July 22, 2025 23:19
@simon-mo simon-mo merged commit b77c7d3 into vllm-project:main Jul 22, 2025
65 of 66 checks passed
yeqcharlotte pushed a commit to yeqcharlotte/vllm that referenced this pull request Jul 23, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
zixi-qi pushed a commit to zixi-qi/vllm that referenced this pull request Jul 23, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: qizixi <qizixi@meta.com>
LyrisZhong pushed a commit to LyrisZhong/vllm that referenced this pull request Jul 23, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
avigny pushed a commit to avigny/vllm that referenced this pull request Jul 31, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
wenscarl pushed a commit to wenscarl/vllm that referenced this pull request Aug 4, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: shuw <shuw@nvidia.com>
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: x22x22 <wadeking@qq.com>
Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
taneem-ibrahim pushed a commit to taneem-ibrahim/vllm that referenced this pull request Aug 14, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
googlercolin pushed a commit to googlercolin/vllm that referenced this pull request Aug 29, 2025
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants