-
-
Notifications
You must be signed in to change notification settings - Fork 4.8k
lib/model, lib/protocol: Index sending/receiving debugging #9657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't say I fully understand what this does, but tried to take a look anyway, as an outsider. Remember to update the BEP protocol description in the docs if this gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that looks off (first comment), otherwise just optional suggestions to maybe improve things.
And this made me wonder what we do regarding the remote sequence we get when establishing a conn in the cluster-config: Turns out we ignore it. We have safety checks that ensure our own sequence matches what the remote announces about us, but not the other way around. We should probably add at least diagnostics to that too. Would thematically fit in this PR if you want to, otherwise I'll do it in a followup.
lib/model/indexhandler.go
Outdated
return s.conn.Index(ctx, &protocol.Index{ | ||
Folder: s.folder, | ||
Files: fs, | ||
LastSequence: lastSequence, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't sentPrevSequence
also be updated in this branch?
lib/model/indexhandler.go
Outdated
} | ||
return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs}) | ||
defer func() { sentPrevSequence = lastSequence }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, not entirely sure of the merits myself:
We might not want to update it when sending returns an error. It's fine now, as we abort index sending entirely on any error, so the updated variable wont be used. However it opens a window for an easy to introduce future bug if we ever decide to handle (e.g. retry) on error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's valid, but I think can be left off at the moment. In principle the batcher could/should panic with a bug message if the flush function is called again after an error is returned, as given how we're using it that's clearly unsafe as we don't know the status of the previous flush...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence
is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence
to be at the value before the send attempt that produced the error.
lib/model/indexhandler.go
Outdated
// Verify that the previous sequence number matches what we expected | ||
if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) { | ||
l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence) | ||
s.evLogger.Log(events.Failure, "index update with unexpected sequence") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could log with ur.FailureReport
as data, and attach the mismatching sequences as extra data. Would be nice to see if there's any pattern (off by one, off by multiple of default patch size, ...), while still keeping the "uniform" event name for grouping in sentry.
lib/model/indexhandler.go
Outdated
|
||
// Verify that the previous sequence number matches what we expected | ||
if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) { | ||
l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these warnings will become banners in the UI, and there's potential for a lot of them - both if this uncovers a (the! :) ) bug or if we get something about the debugging wrong. I assume the intention is for the user to bring such reports to the forum, so we get someone to debug with in addition to just the reports, that we already get from failure reporting. In that case lets maybe do just a single warning per failure type and device on first occurrence, and then info log to avoid plastering the GUI?
lib/model/indexhandler.go
Outdated
// Verify index in relation to the claimed sequence boundaries | ||
if fs[i].Sequence < prevSequence { | ||
l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence) | ||
s.evLogger.Log(events.Failure, "file with sequence before prevSequence") | ||
} | ||
if lastSequence > 0 && fs[i].Sequence > lastSequence { | ||
l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence) | ||
s.evLogger.Log(events.Failure, "file with sequence after lastSequence") | ||
} | ||
if i > 0 && fs[i].Sequence <= fs[i-1].Sequence { | ||
l.Warnf("Bug: device %v folder %s sent non-increasing sequence %d after %d", deviceID.Short(), s.folder, fs[i].Sequence, fs[i-1].Sequence) | ||
s.evLogger.Log(events.Failure, "non-increasing sequence") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look largely "symmetric" to existing checks in sendIndexTo
. Lets use the same pattern there as well:
That panic behind shouldDebug
could also become an unconditional check with failure-logging intead of panicking. And also sending a failure report when detecting a non-increasing sequence there.
lib/model/indexhandler.go
Outdated
var lastSequence int64 | ||
if len(fs) > 0 { | ||
lastSequence = fs[len(fs)-1].Sequence | ||
} | ||
defer func() { sentPrevSequence = lastSequence }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When len(fs) == 0
we don't want to reset sentPrevSequence to 0. This will never happen, as the FileInfoBatch
already checks and aborts if there's nothing to send. Still would be nicer if this function itself was correct, e.g. by immediately returning when fs is empty.
lib/model/indexhandler.go
Outdated
} | ||
return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs}) | ||
defer func() { sentPrevSequence = lastSequence }() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence
is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence
to be at the value before the send attempt that produced the error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
Just one more optional nit, thus already approved.
OK dialling the paranoia to 11, please recheck that logic... |
"seenSeq": fs[i].Sequence, | ||
"atIndex": i, | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this always going to fire? Next is always bigger than last, no? Or is last in this case already adjusted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusion of English terminology, sorry. "last" as "the supposed last sequence number of the batch we're currently processing".
Extra: extraStrs, | ||
}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could have logged to sentry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it does, via the failure reporting mechanism 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the silly questions, I suspect this might not fire, as the issue is related to folderr/app restarts
* main: gui, man, authors: Update docs, translations, and contributors gui: Actually filter scope ID out of IPv6 address when using Remote GUI (ref syncthing#8084) (syncthing#9688) Revert "lib/fs: Put the caseFS as the outermost layer (syncthing#9648)" lib/api: Correct ordering of Accept-Language codes by weight (fixes syncthing#9670) (syncthing#9671) gui, man, authors: Update docs, translations, and contributors gui: Fix incorrect UI language auto detection (fixes syncthing#9668) (syncthing#9669) lib/model, lib/protocol: Index sending/receiving debugging (syncthing#9657) lib/upgrade: Send OS version header to upgrade server (syncthing#9663) lib/upgrade: Send OS version header to upgrade server (syncthing#9663) gui, man, authors: Update docs, translations, and contributors
This adds guardrails to the index sending and receiving, to verify that what we thinks is happening is what actually happens.
(There's a secret bonus as a result of this as well that I haven't implemented yet.)