Skip to content

Conversation

calmh
Copy link
Member

@calmh calmh commented Aug 20, 2024

This adds guardrails to the index sending and receiving, to verify that what we thinks is happening is what actually happens.

(There's a secret bonus as a result of this as well that I haven't implemented yet.)

Copy link
Member

@acolomb acolomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say I fully understand what this does, but tried to take a look anyway, as an outsider. Remember to update the BEP protocol description in the docs if this gets merged.

@calmh calmh changed the base branch from jb/indexrefactor2 to main August 24, 2024 10:48
@calmh calmh changed the title wip: Index sending/receiving debugging lib/model, lib/protocol: Index sending/receiving debugging Aug 24, 2024
@calmh calmh requested a review from imsodin August 24, 2024 10:57
Copy link
Member

@imsodin imsodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that looks off (first comment), otherwise just optional suggestions to maybe improve things.

And this made me wonder what we do regarding the remote sequence we get when establishing a conn in the cluster-config: Turns out we ignore it. We have safety checks that ensure our own sequence matches what the remote announces about us, but not the other way around. We should probably add at least diagnostics to that too. Would thematically fit in this PR if you want to, otherwise I'll do it in a followup.

Comment on lines 241 to 245
return s.conn.Index(ctx, &protocol.Index{
Folder: s.folder,
Files: fs,
LastSequence: lastSequence,
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't sentPrevSequence also be updated in this branch?

}
return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs})
defer func() { sentPrevSequence = lastSequence }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, not entirely sure of the merits myself:
We might not want to update it when sending returns an error. It's fine now, as we abort index sending entirely on any error, so the updated variable wont be used. However it opens a window for an easy to introduce future bug if we ever decide to handle (e.g. retry) on error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's valid, but I think can be left off at the moment. In principle the batcher could/should panic with a bug message if the flush function is called again after an error is returned, as given how we're using it that's clearly unsafe as we don't know the status of the previous flush...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence to be at the value before the send attempt that produced the error.

// Verify that the previous sequence number matches what we expected
if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) {
l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence)
s.evLogger.Log(events.Failure, "index update with unexpected sequence")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could log with ur.FailureReport as data, and attach the mismatching sequences as extra data. Would be nice to see if there's any pattern (off by one, off by multiple of default patch size, ...), while still keeping the "uniform" event name for grouping in sentry.


// Verify that the previous sequence number matches what we expected
if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) {
l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these warnings will become banners in the UI, and there's potential for a lot of them - both if this uncovers a (the! :) ) bug or if we get something about the debugging wrong. I assume the intention is for the user to bring such reports to the forum, so we get someone to debug with in addition to just the reports, that we already get from failure reporting. In that case lets maybe do just a single warning per failure type and device on first occurrence, and then info log to avoid plastering the GUI?

Comment on lines 361 to 373
// Verify index in relation to the claimed sequence boundaries
if fs[i].Sequence < prevSequence {
l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence)
s.evLogger.Log(events.Failure, "file with sequence before prevSequence")
}
if lastSequence > 0 && fs[i].Sequence > lastSequence {
l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence)
s.evLogger.Log(events.Failure, "file with sequence after lastSequence")
}
if i > 0 && fs[i].Sequence <= fs[i-1].Sequence {
l.Warnf("Bug: device %v folder %s sent non-increasing sequence %d after %d", deviceID.Short(), s.folder, fs[i].Sequence, fs[i-1].Sequence)
s.evLogger.Log(events.Failure, "non-increasing sequence")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These look largely "symmetric" to existing checks in sendIndexTo. Lets use the same pattern there as well:
That panic behind shouldDebug could also become an unconditional check with failure-logging intead of panicking. And also sending a failure report when detecting a non-increasing sequence there.

@calmh
Copy link
Member Author

calmh commented Aug 25, 2024

I added some things and extra data, made the warning a once-only thing.

Screenshot 2024-08-25 at 09 37 22

Comment on lines 235 to 239
var lastSequence int64
if len(fs) > 0 {
lastSequence = fs[len(fs)-1].Sequence
}
defer func() { sentPrevSequence = lastSequence }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When len(fs) == 0 we don't want to reset sentPrevSequence to 0. This will never happen, as the FileInfoBatch already checks and aborts if there's nothing to send. Still would be nicer if this function itself was correct, e.g. by immediately returning when fs is empty.

}
return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs})
defer func() { sentPrevSequence = lastSequence }()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence to be at the value before the send attempt that produced the error.

Copy link
Member

@imsodin imsodin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Just one more optional nit, thus already approved.

@calmh
Copy link
Member Author

calmh commented Aug 25, 2024

OK dialling the paranoia to 11, please recheck that logic...

@calmh calmh merged commit 42e677c into syncthing:main Aug 28, 2024
20 of 21 checks passed
"seenSeq": fs[i].Sequence,
"atIndex": i,
})
}
Copy link
Member

@AudriusButkevicius AudriusButkevicius Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this always going to fire? Next is always bigger than last, no? Or is last in this case already adjusted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusion of English terminology, sorry. "last" as "the supposed last sequence number of the batch we're currently processing".

Extra: extraStrs,
})
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could have logged to sentry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it does, via the failure reporting mechanism 👍

Copy link
Member

@AudriusButkevicius AudriusButkevicius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the silly questions, I suspect this might not fire, as the issue is related to folderr/app restarts

calmh added a commit to calmh/syncthing that referenced this pull request Sep 11, 2024
* main:
  gui, man, authors: Update docs, translations, and contributors
  gui: Actually filter scope ID out of IPv6 address when using Remote GUI (ref syncthing#8084) (syncthing#9688)
  Revert "lib/fs: Put the caseFS as the outermost layer (syncthing#9648)"
  lib/api: Correct ordering of Accept-Language codes by weight (fixes syncthing#9670) (syncthing#9671)
  gui, man, authors: Update docs, translations, and contributors
  gui: Fix incorrect UI language auto detection (fixes syncthing#9668) (syncthing#9669)
  lib/model, lib/protocol: Index sending/receiving debugging (syncthing#9657)
  lib/upgrade: Send OS version header to upgrade server (syncthing#9663)
  lib/upgrade: Send OS version header to upgrade server (syncthing#9663)
  gui, man, authors: Update docs, translations, and contributors
@calmh calmh added this to the v1.27.13 milestone Sep 11, 2024
@calmh calmh deleted the jb/indexnos branch May 26, 2025 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants