Skip to content

Conversation

snyamathi
Copy link
Contributor

@snyamathi snyamathi commented Aug 15, 2025

repro info here: https://github.com/snyamathi/nextjs-mem-leak

Background

The Fetch Standard allows users to skip consuming the response body by relying on garbage collection to release connection resources.

Undici added support for this in nodejs/undici#3199 and later for the body of cloned responses in nodejs/undici#3458 which first landed in Undici version 6.19.8 in August 2024. In September 2024, this issue #69635 was raised for the error, TypeError: Response.clone: Body has already been consumed

In NextJS's dedupe-fetch the cloned response is returned to userland, while the original response is stored in a react cache

// match was pulled from react cache, a clone is returned to the user
return match.then((response: Response) => response.clone());

Undici Bug

I'm omitting some details in the code snippet below (see full changes at https://github.com/nodejs/undici/pull/3458/files), but the Undici change to use FinalizationRegistry for cloned response body seems to have mixed up the response pointer and stream bodies when registering with the finalization registry, resulting in the wrong stream being cancelled.

The original response, (the match above) stored in the react cache is cloned, and then its stream is registered with the finalization registry when the cloned response newRequest is reclaimed.

I believe this is the true underlying cause of the errors: Body has already been consumed

function cloneBody(instance, body) {
  const [out1, out2] = body.stream.tee();

  // Erroneously registering newRequest + old body with finalization registry
  streamRegistry.register(instance, new WeakRef(out1));

  // Original request + out1 is used in Next's dedupe-request cache
  body.stream = out1;

  // Clone request + out2 is returned to userland
  return {
    stream: out2,
  };
}

// When newRequest is reclaimed, the original request.body is cancelled by mistake!
newRequest.body = cloneBody(newRequest, request.body);

// This is what the registry looks like
streamRegistry = new FinalizationRegistry((weakRef) => {
  const stream = weakRef.deref();
  if (stream && !stream.locked && !isDisturbed(stream) && !isErrored(stream)) {
    stream.cancel("Response object has been garbage collected").catch(noop);
  }
});

https://github.com/snyamathi/nextjs-mem-leak/blob/main/index.mjs has a simple reproduction of this issue which shows that when the clone is reclaimed, the original stream is cancelled, leading to issues

Memory Leak

This lead to #73274 which fixed the problem by adding a custom cloneResponse function.

https://github.com/vercel/next.js/blob/7ef0a2b2767b4159f8db8e1884bac98370528a25/packages/next/src/server/lib/clone-response.ts

However, this in turn has lead to a memory leak because now there is no one to garbage collect the tee'd stream.

https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee

To cancel the stream you then need to cancel both resulting branches. Teeing a stream will generally lock it for the duration, preventing other readers from locking it.

Undici was incorrectly cancelling the stream (leading to a bug), and Next is simply not cancelling the stream (leading to a memory leak). This can be observed with a Docker setup which shows a current version of NextJS, the last version prior to the custom cloneResponse function, as well as the effects of the proposed fix here.

plot

I have a reproduction available here https://github.com/snyamathi/nextjs-mem-leak/tree/main - you'll notice the memory for NextJS climb until it runs out of memory and crashes, whereas the version prior to this change (15.0.4-canary) is unaffected.

Similarly, if we forego the dedupe cache by passing in a signal (

if (options && options.signal) {
) there is no memory leak either.

The PR changes here are applied as a patch and also fix the memory leak :)

docker compose pull
docker compose build
docker compose up -d
docker container stats
CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT    MEM %     NET I/O           BLOCK I/O   PIDS
1b2f217c4d03   mem-next-og-1                 28.50%    124MiB / 1GiB        12.11%    993MB / 7.45MB    0B / 0B     23
7e76622a4a3e   mem-next-15.4.1-1             49.03%    1021MiB / 1GiB       99.69%    993MB / 7.5MB     0B / 0B     23
1bacb1a9b1cc   mem-next-15.0.4-canary.39-1   21.22%    115.6MiB / 1GiB      11.29%    991MB / 7.44MB    0B / 0B     23
df2a8ba5800e   mem-siege-1                   5.59%     12.56MiB / 31.2GiB   0.04%     9.73MB / 10.5MB   0B / 0B     102
46fab210c911   mem-next-patched-1            2.46%     91.34MiB / 1GiB      8.92%     2.38MB / 1.97MB   0B / 0B     23
365987089d03   mem-upstream-1                15.16%    137.2MiB / 31.2GiB   0.43%     16.8MB / 2.97GB   0B / 0B     23

Each container outputs the request number and current memory usage which are then plotted in order to observe the memory leak due to the custom cloneResponse.

Because the fix associates the correct response and stream, the previous regression does not happen again. We can confirm this by making requests to the page for each of the docker containers. The container using the version prior to the custom cloneResponse will error out, while the rest will not

$ curl -s localhost:3002 | htmlq --text '#error'
Response.clone: Body has already been consumed.

cc @wyattjoh @gnoff

Reproduction

https://github.com/snyamathi/nextjs-mem-leak/tree/main contains a number of reproductions

The rest of the files are supporting infrastructure to start docker containers of various versions and various patches showing how versions prior to the cloneResponse do not have the same memory leak, and how the patch (PR'd here) fixes the issue without regressing on the Undici issue.

@ijjk
Copy link
Member

ijjk commented Aug 15, 2025

Allow CI Workflow Run

  • approve CI run for commit: c8be8bc

Note: this should only be enabled once the PR is ready to go and can only be enabled by a maintainer

@eps1lon eps1lon requested review from wyattjoh and gnoff August 15, 2025 13:39
@ijjk
Copy link
Member

ijjk commented Aug 15, 2025

Tests Passed

@wyattjoh
Copy link
Member

Hey @snyamathi! Do you think you can address the failing tests in this PR? Feel free to ask any questions.

Copy link
Member

@eps1lon eps1lon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this code work in edge runtimes?
What about older Node.js versions?

@@ -1,3 +1,22 @@
// @ts-ignore https://nodejs.org/api/stream.html#streamreadableisdisturbedstream
import { isDisturbed, isErrored } from 'node:stream'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read the docs correctly, these are statics on Readable. Can't use node: either since it needs to work in Edge runtimes.

Suggested change
import { isDisturbed, isErrored } from 'node:stream'
import { Readable } from 'stream'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, let me update this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

I'm still not able to get the isDisturbed or isErrored checks working type-wise. However, I'm also not certain they're needed at all.

The stream you are trying to cancel is not a ReadableStream, or it is locked.

https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/cancel#exceptions

Additionally, we're also catching the error, if one were to occur.

Comment on lines 12 to 13
!isDisturbed(stream) &&
!isErrored(stream as any as NodeJS.ReadableStream)
Copy link
Member

@eps1lon eps1lon Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
!isDisturbed(stream) &&
!isErrored(stream as any as NodeJS.ReadableStream)
!Readable.isDisturbed(stream) &&
!Readable.isErrored(stream)

@snyamathi
Copy link
Contributor Author

I've added a check for globalThis.FinalizationRegistry for any runtimes which do not support it.

I'm trying to get tests and to run locally - even the current canary branch is having issues testing locally, though I'll see what I can do to work through this. These tests take a long time to run! 😅

@eps1lon eps1lon added the CI approved Approve running CI for fork label Aug 15, 2025
@wyattjoh wyattjoh merged commit 6eb32cf into vercel:canary Aug 15, 2025
192 of 202 checks passed
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CI approved Approve running CI for fork locked type: next
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants