-
Notifications
You must be signed in to change notification settings - Fork 158
Description
Bottom line up front
When the conditions are just right, it's possible for a grpc stream (underlying an onSnapshot query) to emit an error
event while not having any error handlers, killing the node process. It's rare, but our server's heavy traffic causes it to happen multiple times per day.
-
Is this a client library issue or a product issue?
Client library. A transient error isn't properly handled by the library under certain conditions. -
Did someone already solve this?
None of my searches have yielded results. -
Do you have a support contract?
I don't
Environment details
- OS: Debian GNU/Linux 12 (inside Docker)
- Node.js version: 20.18.0
- npm version: 10.8.2
@google-cloud/firestore
version: 7.4.0
Steps to reproduce
While I can't write code that reliably reproduces the problem (it's a low probability race condition), I can point to a code path that is hit and support my hypothesis with logs emitted by our server.
https://github.com/googleapis/nodejs-firestore/blob/main/dev/src/watch.ts#L483-L513
// inside Watch.initStream() method
...
return this.firestore
.requestStream( // (1)
'listen',
/* bidirectional= */ true,
request,
this.requestTag
)
.then(backendStream => {
if (!this.isActive) { // (2)
logger(
'Watch.initStream',
this.requestTag,
'Closing inactive stream'
);
backendStream.emit('end');
return; // (3)
}
logger('Watch.initStream', this.requestTag, 'Opened new stream');
this.currentStream = backendStream;
this.resetIdleTimeout();
this.currentStream!.on('data', (proto: api.IListenResponse) => {
this.resetIdleTimeout();
this.onData(proto);
})
.on('error', err => { // (4)
if (this.currentStream === backendStream) {
this.currentStream = null;
this.maybeReopenStream(err);
}
})
- First thing that must happen is a query being closed very soon after opening. This way, before the async call (1), the
.isActive
field is flipped back to false. Subsequently, the code enters theif
clause (2). Note that, in this path, we exit the function at (3) andbackendStream
doesn't get an error handler like it does in (4). - Secondly, the newly opened
backendStream
must emit an error (it can be anything, like a transient networking issue). The fact that it receives anend
event earlier doesn't preclude it from doing so. - Consequently, that emitted error crashes the node process.
Logs (I have selectively enabled certain firestore's internal logs to be formatted and logged to console):
T21:39:11.502801006Z Sending request: { [redacted] } { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T21:39:11.503399310Z Marking stream as healthy { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T21:39:11.503738877Z Stream ready { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T21:39:11.504049930Z Closing inactive stream { requestTag: 'rvKwm', fun: '[Watch.initStream]' }
T21:39:11.504533921Z Received stream end { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T22:39:12.017741217Z Received stream error: Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)
T22:39:12.017790903Z at callErrorFromStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/call.js:31:19)
T22:39:12.017796528Z at Object.onReceiveStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:420:73)
T22:39:12.017799765Z at Object.onReceiveStatus (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client-interceptors.js:323:181)
T22:39:12.017802736Z at /usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/resolving-call.js:129:78
T22:39:12.017805701Z at process.processTicksAndRejections (node:internal/process/task_queues:77:11)
T22:39:12.017808736Z for call at
T22:39:12.017812711Z at ServiceClientImpl.makeBidiStreamRequest (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/client.js:404:32)
T22:39:12.017815664Z at ServiceClientImpl.<anonymous> (/usr/src/app/node_modules/google-gax/node_modules/@grpc/grpc-js/build/src/make-client.js:105:19)
T22:39:12.017819232Z at /usr/src/app/node_modules/@google-cloud/firestore/build/src/v1/firestore_client.js:239:29
T22:39:12.017839883Z at /usr/src/app/node_modules/google-gax/build/src/streamingCalls/streamingApiCaller.js:46:28
T22:39:12.017842602Z at /usr/src/app/node_modules/google-gax/build/src/normalCalls/timeout.js:44:16
T22:39:12.017845234Z at StreamProxy.setStream (/usr/src/app/node_modules/google-gax/build/src/streamingCalls/streaming.js:419:24)
T22:39:12.017847960Z at StreamingApiCaller.call (/usr/src/app/node_modules/google-gax/build/src/streamingCalls/streamingApiCaller.js:54:16)
T22:39:12.017863772Z at /usr/src/app/node_modules/google-gax/build/src/createApiCall.js:112:30
T22:39:12.017866637Z at runNextTicks (node:internal/process/task_queues:60:5)
T22:39:12.017869664Z at listOnTimeout (node:internal/timers:545:9)
T22:39:12.017872719Z at process.processTimers (node:internal/timers:519:7) {
T22:39:12.017875762Z code: 13,
T22:39:12.017878562Z details: 'Received RST_STREAM with code 2 (Internal server error)',
T22:39:12.017881255Z metadata: Metadata { internalRepr: Map(0) {}, options: {} }
T22:39:12.017883953Z } { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T22:39:12.017886915Z Received stream end { requestTag: 'rvKwm', fun: '[Firestore._initializeStream]' }
T22:39:12.019839360Z node:events:496
T22:39:12.019872407Z throw er; // Unhandled 'error' event
T22:39:12.019876840Z ^
T22:39:12.019879811Z
T22:39:12.019882613Z Error: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error)