(2.12) Filestore async flush #7018

MauriceVanVeen · 2025-06-30T09:41:17Z

The filestore's AsyncFlush setting can now be enabled for a JetStream stream using the AllowAsyncFlush field. Some initial benchmarks showed a 10-15% performance increase when using a R3 stream (3-node cluster with each node in a different availability zone).

Only enabling the filestore's AsyncFlush setting without additional code would be unsafe. That's why this PR introduces some new mechanisms to make it safe. This makes the performance improvement essentially "free", with no negative consequences for data safety/consistency.

Before n.InstallSnapshot(..) we now do a mset.flushAllPending() to ensure all state represented in the snapshot, actually all made it to disk.
The previous fs.lmb would only be sometimes be flushed, for example in fs.checkLastBlock but not when creating a new fs.lmb in fs.newMsgBlockForWrite. Instead, always flush and close fs.lmb when creating a new block in fs.newMsgBlockForWrite. This resolves non-monotonic sequence issues that could be reproduced by Antithesis.
The AllowAsyncFlush stream setting can be freely/safely enabled and disabled. It will only be effective when using file storage, and only when the stream is backed by a Raft log, i.e. it's replicated.

Relates to #6784

Signed-off-by: Maurice van Veen github@mauricevanveen.com

MauriceVanVeen · 2025-06-30T10:39:55Z

Going to try if this can be simplified a bit more, in draft for now.

alexbozhenko · 2025-06-30T15:19:39Z

How is this related to the sync_interval setting in nats.conf?

MauriceVanVeen · 2025-06-30T15:28:32Z

How is this related to the sync_interval setting in nats.conf?

That means when fsync is called on the file. Either always or a specified interval. Currently writes are always synchronous, the file write is done. And is fsync-ed on an interval, or always after the write if sync_interval: always. This PR makes the writes asynchronous when enabled. If sync_interval: always, the writes may still happen asynchronously, but when they are written, fsync would be called right after as well.

server/raft.go

server/filestore.go

server/filestore_test.go

derekcollison

Let's do a call tomorrow to discuss - mostly interested in the ceIndex new arg and whether or not we can avoid introducing a new arg to public functions.

server/filestore.go

server/memstore.go

server/raft.go

derekcollison · 2025-07-21T15:31:18Z

server/stream.go

@@ -778,7 +782,8 @@ func (a *Account) addStreamWithAssignment(config *StreamConfig, fsConfig *FileSt
 		}
 	}
 	fsCfg.StoreDir = storeDir
-	fsCfg.AsyncFlush = false
+	// Async flushing is only allowed if the stream has a sync log backing it.


Could we allow for R1s with caveat data loss might be possible?

Would like to discuss that, but (if we agree) allow that in a separate PR.

I'm thinking people would likely jump to opt-in to incredible throughput at the cost of consistency for R1, to then ask why writes are lost during hard kills, etc. So, I'm honestly not sure yet if we should be allowing this at all.

Enabling for replicated is fine, because we have a log where our writes are safely stored. Async applying on JetStream then is just a "free" speedup.

MauriceVanVeen · 2025-07-24T08:21:06Z

Let's do a call tomorrow to discuss - mostly interested in the ceIndex new arg and whether or not we can avoid introducing a new arg to public functions.

During the call we discussed a simpler method to force flushing just before doing a n.InstallSnapshot(..). Have done several runs through Antithesis to confirm this indeed also provides the required guarantees for async flushing to be safe.
But only after this additional fix: 60c3895

Have updated the PR to contain this simpler approach, without the need for passing ce.Index around.

neilalexander

LGTM

server/filestore.go

server/jetstream_cluster.go

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

…loop resources Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

neilalexander

LGTM

derekcollison

LGTM

Includes the following: - Backported commit 12552e0 from #7018 - #7100 - #7107 - #7109 - #7110 - #7111 Signed-off-by: Neil Twigg <neil@nats.io>

MauriceVanVeen requested a review from a team as a code owner June 30, 2025 09:41

MauriceVanVeen marked this pull request as draft June 30, 2025 10:31

MauriceVanVeen marked this pull request as ready for review June 30, 2025 20:17

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 3096a0c to 87bd91e Compare July 3, 2025 10:59

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 87bd91e to 279101d Compare July 11, 2025 09:02

neilalexander reviewed Jul 17, 2025

View reviewed changes

server/raft.go Outdated Show resolved Hide resolved

server/filestore.go Show resolved Hide resolved

server/filestore.go Outdated Show resolved Hide resolved

server/filestore_test.go Show resolved Hide resolved

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch 2 times, most recently from e94193a to 22d5cf0 Compare July 21, 2025 12:07

neilalexander requested a review from derekcollison July 21, 2025 12:51

derekcollison reviewed Jul 21, 2025

View reviewed changes

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 22d5cf0 to 60c3895 Compare July 24, 2025 08:12

neilalexander approved these changes Jul 24, 2025

View reviewed changes

derekcollison self-requested a review July 24, 2025 11:18

derekcollison reviewed Jul 24, 2025

View reviewed changes

server/filestore.go Show resolved Hide resolved

server/filestore.go Show resolved Hide resolved

server/jetstream_cluster.go Show resolved Hide resolved

server/jetstream_cluster.go Show resolved Hide resolved

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from c3e7532 to bc4655b Compare July 24, 2025 12:06

MauriceVanVeen added 5 commits July 24, 2025 16:09

[FIXED] flushLoop leaks goroutine

71eba82

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

[FIXED] Flush pending writes before truncating

cf3a590

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

(2.12) Filestore async flush

4302e77

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

(2.12) [FIXED] Always flush lmb in newMsgBlockForWrite & close flush …

12552e0

…loop resources Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

(2.12) [IMPROVED] Only flush lmb

ec98ff4

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>

MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from bc4655b to ec98ff4 Compare July 24, 2025 14:09

neilalexander approved these changes Jul 24, 2025

View reviewed changes

neilalexander merged commit 71912c7 into main Jul 24, 2025
109 of 114 checks passed

neilalexander deleted the maurice/replicated-async-flush branch July 24, 2025 14:40

derekcollison reviewed Jul 24, 2025

View reviewed changes

neilalexander mentioned this pull request Jul 29, 2025

Cherry-picks for 2.11.7-RC.2 #7115

Merged

neilalexander added a commit that referenced this pull request Jul 29, 2025

Cherry-picks for 2.11.7-RC.2 (#7115)

b3220a6

Includes the following: - Backported commit 12552e0 from #7018 - #7100 - #7107 - #7109 - #7110 - #7111 Signed-off-by: Neil Twigg <neil@nats.io>

ripienaar mentioned this pull request Aug 8, 2025

Support async flush nats-io/jsm.go#688

Merged

Uh oh!

(2.12) Filestore async flush #7018

(2.12) Filestore async flush #7018

Uh oh!

Conversation

MauriceVanVeen commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MauriceVanVeen commented Jun 30, 2025

Uh oh!

alexbozhenko commented Jun 30, 2025

Uh oh!

MauriceVanVeen commented Jun 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

derekcollison left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

derekcollison Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen Jul 21, 2025

Choose a reason for hiding this comment

Uh oh!

MauriceVanVeen commented Jul 24, 2025

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

neilalexander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

derekcollison left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

MauriceVanVeen commented Jun 30, 2025 •

edited

Loading