Skip to content

snyamathi/nextjs-mem-leak

Repository files navigation

The Fetch Standard allows users to skip consuming the response body by relying on garbage collection to release connection resources.

Undici added support for this in nodejs/undici#3199 and later for the body of cloned responses in nodejs/undici#3458 which first landed in Undici version 6.19.8 in August 2024. In September 2024, this issue vercel/next.js#69635 was raised with the body TypeError: Response.clone: Body has already been consumed

In NextJS's dedupe-fetch the cloned response is returned to userland, while the original response is stored in a react cache

// match was pulled from react cache, a clone is returned to the user
return match.then((response: Response) => response.clone());

I'm omitting some details, but the undici change to use FinalizationRegistry for cloned response body seems to have mixed up the response pointer and stream bodies when registering with the finalization registry, resulting in the wrong stream being cancelled.

The original response, the match above stored in the react cache is cloned, and then its stream is registred with the finalization registry when the cloned response newRequest is reclaimed.

I believe this is the true underlying cause of the errors: Body has already been consumed

function cloneBody(instance, body) {
  const [out1, out2] = body.stream.tee();

  // Erroneously registering newRequest + old body with finalization registry
  streamRegistry.register(instance, new WeakRef(out1));

  // Original request + out1 is used in Next's dedupe-request cache
  body.stream = out1;

  // Clone request + out2 is returned to userland
  return {
    stream: out2,
  };
}

// When newRequest is reclaimed, the original request.body is cancelled
newRequest.body = cloneBody(newRequest, request.body);

// This is what the registry looks like
streamRegistry = new FinalizationRegistry((weakRef) => {
  const stream = weakRef.deref();
  if (stream && !stream.locked && !isDisturbed(stream) && !isErrored(stream)) {
    stream.cancel("Response object has been garbage collected").catch(noop);
  }
});

This lead to vercel/next.js#73274 which fixed the problem by adding a custom cloneResponse function.

However, this in turn has lead to a memory leak because now there is no one to garbage collect the tee'd stream.

https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/tee

To cancel the stream you then need to cancel both resulting branches. Teeing a stream will generally lock it for the duration, preventing other readers from locking it.

Undici was incorrectly cancelling the stream (leading to a bug), and Next is simply not cancelling the stream (leading to a memory leak). This can be observed with a Docker setup which shows a current version of NextJS, the last version prior to the custom cloneResponse function, as well as the effects of the proposed fix here.

docker compose pull
docker compose build
docker compose up -d
docker container stats
CONTAINER ID   NAME                          CPU %     MEM USAGE / LIMIT    MEM %     NET I/O           BLOCK I/O   PIDS
1b2f217c4d03   mem-next-og-1                 28.50%    124MiB / 1GiB        12.11%    993MB / 7.45MB    0B / 0B     23
7e76622a4a3e   mem-next-15.4.1-1             49.03%    1021MiB / 1GiB       99.69%    993MB / 7.5MB     0B / 0B     23
1bacb1a9b1cc   mem-next-15.0.4-canary.39-1   21.22%    115.6MiB / 1GiB      11.29%    991MB / 7.44MB    0B / 0B     23
df2a8ba5800e   mem-siege-1                   5.59%     12.56MiB / 31.2GiB   0.04%     9.73MB / 10.5MB   0B / 0B     102
46fab210c911   mem-next-patched-1            2.46%     91.34MiB / 1GiB      8.92%     2.38MB / 1.97MB   0B / 0B     23
365987089d03   mem-upstream-1                15.16%    137.2MiB / 31.2GiB   0.43%     16.8MB / 2.97GB   0B / 0B     23

Each container outputs the request number and current memory usage which are then plotted in order to observe the memory leak due to the custom cloneResponse.

plot

Because the fix associates the correct response and stream, the previous regression does not happen again. We can confirm this by making requests to the page for each of the docker containers. The container using the version prior to the custom cloneResponse will error out, while the rest will not

$ curl -s localhost:3002 | htmlq --text '#error'
Response.clone: Body has already been consumed.

About

Reproduction of a memory leak in NextJS cloneResponse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published