Skip to content

Conversation

jlevesy
Copy link
Contributor

@jlevesy jlevesy commented Nov 25, 2023

What does this PR do?

  • ebce11d Makes the agent.DB notify the remote storage (via a wlog.Notifier) when a commit happens.
  • eb73a8e Makes the wlog.Watcher.watch method read a segment synchronously when it is not tailing the last segment instead of waiting. It includes a test that makes sure we're able to replay within a single timeout interval a generated set of segments. I'm not exactly sure that's what you expected, but I can't see how a real benchmark would help here.

Those two fixes commit are the outcome of the discussion happening in #13111

Fixes #13111

Signed-off-by: Julien Levesy <jlevesy@gmail.com>
Signed-off-by: Julien Levesy <jlevesy@gmail.com>
@jlevesy
Copy link
Contributor Author

jlevesy commented Nov 25, 2023

I'll roll out that branch on my dev environment on Monday, see if it makes the expected changes :)

@bboreham
Copy link
Member

This feels like two PRs to me: one allowing the agent to participate in Notify calls, and one changing the non-tail behaviour to not need a Notify.

If I'm mistaken, please modify the description to highlight how they are related.

Copy link
Member

@cstyan cstyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll ask someone who's more involved in the agent work to double check, but this looks good to me.

@@ -487,6 +485,8 @@ func (w *Watcher) watch(segmentNum int, tail bool) error {
readTicker.Reset(readTimeout)

case <-w.readNotify:
level.Debug(w.logger).Log("msg", "Watcher is reading the WAL due to notify")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to leave out this log line. It would become way too noisy during normal remote write execution, and we want debug logging to still be useful for finding other issues.

If you're needing insight into WAL reads due to notify vs timeout we could add a metric instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, that's a debugging leftover sorry. Will clean up!

@jlevesy
Copy link
Contributor Author

jlevesy commented Nov 27, 2023

This feels like two PRs to me: one allowing the agent to participate in Notify calls, and one changing the non-tail behaviour to not need a Notify.

It is indeed. It was the outcome of the discussion happening in #13111 so I packed them into a single PR. But I can open two separate PRs if you want!

@jlevesy jlevesy force-pushed the jly/13111/actual-fix branch from 4f4d999 to fcf68b4 Compare November 27, 2023 16:41
Signed-off-by: Julien Levesy <jlevesy@gmail.com>
@jlevesy jlevesy force-pushed the jly/13111/actual-fix branch from fcf68b4 to 9cb9c9e Compare November 27, 2023 16:46
@bboreham
Copy link
Member

I find it easier to think about self-contained changes, plus we get the benefit of seeing that CI passes on each independent change.

@cstyan
Copy link
Member

cstyan commented Nov 27, 2023

I don't have a preference either way; these are relatively small changes and you've already isolated them nicely in separate commits.

@mattdurham
Copy link

Looks reasonable from my standpoint. The only edge case I see is abusing to SetWriteNotified could lead to a nil panic but that seems like an abuse and likely not worth the mutex.

@cstyan
Copy link
Member

cstyan commented Nov 28, 2023

@mattdurham fair point, we should probably at least have a comment on the interface that SetWriteNotified is only meant to be called once/during initialization. We can set up more safety around it's usage if we find a need to, a lock has some cost. Thanks for taking a look 👍

@cstyan cstyan self-assigned this Nov 28, 2023
Signed-off-by: Julien Levesy <jlevesy@gmail.com>
@jlevesy
Copy link
Contributor Author

jlevesy commented Nov 30, 2023

Opened #13223 and #13224 as per asked. Closing this PR.

@jlevesy jlevesy closed this Nov 30, 2023
@jlevesy jlevesy deleted the jly/13111/actual-fix branch November 30, 2023 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WAL watcher replay in agent mode takes significantly more time since 2.45
4 participants