Skip to content

Conversation

MauriceVanVeen
Copy link
Member

@MauriceVanVeen MauriceVanVeen commented Jun 30, 2025

The filestore's AsyncFlush setting can now be enabled for a JetStream stream using the AllowAsyncFlush field. Some initial benchmarks showed a 10-15% performance increase when using a R3 stream (3-node cluster with each node in a different availability zone).

Only enabling the filestore's AsyncFlush setting without additional code would be unsafe. That's why this PR introduces some new mechanisms to make it safe. This makes the performance improvement essentially "free", with no negative consequences for data safety/consistency.

  • Before n.InstallSnapshot(..) we now do a mset.flushAllPending() to ensure all state represented in the snapshot, actually all made it to disk.
  • The previous fs.lmb would only be sometimes be flushed, for example in fs.checkLastBlock but not when creating a new fs.lmb in fs.newMsgBlockForWrite. Instead, always flush and close fs.lmb when creating a new block in fs.newMsgBlockForWrite. This resolves non-monotonic sequence issues that could be reproduced by Antithesis.
  • The AllowAsyncFlush stream setting can be freely/safely enabled and disabled. It will only be effective when using file storage, and only when the stream is backed by a Raft log, i.e. it's replicated.

Relates to #6784

Signed-off-by: Maurice van Veen github@mauricevanveen.com

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner June 30, 2025 09:41
@MauriceVanVeen MauriceVanVeen marked this pull request as draft June 30, 2025 10:31
@MauriceVanVeen
Copy link
Member Author

Going to try if this can be simplified a bit more, in draft for now.

@alexbozhenko
Copy link
Contributor

How is this related to the sync_interval setting in nats.conf?

@MauriceVanVeen
Copy link
Member Author

How is this related to the sync_interval setting in nats.conf?

That means when fsync is called on the file. Either always or a specified interval. Currently writes are always synchronous, the file write is done. And is fsync-ed on an interval, or always after the write if sync_interval: always. This PR makes the writes asynchronous when enabled. If sync_interval: always, the writes may still happen asynchronously, but when they are written, fsync would be called right after as well.

@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review June 30, 2025 20:17
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 3096a0c to 87bd91e Compare July 3, 2025 10:59
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 87bd91e to 279101d Compare July 11, 2025 09:02
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch 2 times, most recently from e94193a to 22d5cf0 Compare July 21, 2025 12:07
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a call tomorrow to discuss - mostly interested in the ceIndex new arg and whether or not we can avoid introducing a new arg to public functions.

@@ -778,7 +782,8 @@ func (a *Account) addStreamWithAssignment(config *StreamConfig, fsConfig *FileSt
}
}
fsCfg.StoreDir = storeDir
fsCfg.AsyncFlush = false
// Async flushing is only allowed if the stream has a sync log backing it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we allow for R1s with caveat data loss might be possible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would like to discuss that, but (if we agree) allow that in a separate PR.

I'm thinking people would likely jump to opt-in to incredible throughput at the cost of consistency for R1, to then ask why writes are lost during hard kills, etc. So, I'm honestly not sure yet if we should be allowing this at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enabling for replicated is fine, because we have a log where our writes are safely stored. Async applying on JetStream then is just a "free" speedup.

@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 22d5cf0 to 60c3895 Compare July 24, 2025 08:12
@MauriceVanVeen
Copy link
Member Author

Let's do a call tomorrow to discuss - mostly interested in the ceIndex new arg and whether or not we can avoid introducing a new arg to public functions.

During the call we discussed a simpler method to force flushing just before doing a n.InstallSnapshot(..). Have done several runs through Antithesis to confirm this indeed also provides the required guarantees for async flushing to be safe.
But only after this additional fix: 60c3895

Have updated the PR to contain this simpler approach, without the need for passing ce.Index around.

Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@derekcollison derekcollison self-requested a review July 24, 2025 11:18
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from c3e7532 to bc4655b Compare July 24, 2025 12:06
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
…loop resources

Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
Signed-off-by: Maurice van Veen <github@mauricevanveen.com>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from bc4655b to ec98ff4 Compare July 24, 2025 14:09
Copy link
Member

@neilalexander neilalexander left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@neilalexander neilalexander merged commit 71912c7 into main Jul 24, 2025
109 of 114 checks passed
@neilalexander neilalexander deleted the maurice/replicated-async-flush branch July 24, 2025 14:40
Copy link
Member

@derekcollison derekcollison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

neilalexander added a commit that referenced this pull request Jul 29, 2025
Includes the following:

- Backported commit 12552e0 from #7018
- #7100 
- #7107
- #7109
- #7110
- #7111

Signed-off-by: Neil Twigg <neil@nats.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants