lib/model, lib/protocol: Index sending/receiving debugging #9657

calmh · 2024-08-20T16:50:16Z

This adds guardrails to the index sending and receiving, to verify that what we thinks is happening is what actually happens.

(There's a secret bonus as a result of this as well that I haven't implemented yet.)

acolomb

I can't say I fully understand what this does, but tried to take a look anyway, as an outsider. Remember to update the BEP protocol description in the docs if this gets merged.

lib/model/indexhandler.go

imsodin

One thing that looks off (first comment), otherwise just optional suggestions to maybe improve things.

And this made me wonder what we do regarding the remote sequence we get when establishing a conn in the cluster-config: Turns out we ignore it. We have safety checks that ensure our own sequence matches what the remote announces about us, but not the other way around. We should probably add at least diagnostics to that too. Would thematically fit in this PR if you want to, otherwise I'll do it in a followup.

imsodin · 2024-08-24T10:47:04Z

lib/model/indexhandler.go

+			return s.conn.Index(ctx, &protocol.Index{
+				Folder:       s.folder,
+				Files:        fs,
+				LastSequence: lastSequence,
+			})


Shouldn't sentPrevSequence also be updated in this branch?

imsodin · 2024-08-24T10:51:11Z

lib/model/indexhandler.go

 		}
-		return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs})
+		defer func() { sentPrevSequence = lastSequence }()


Just a thought, not entirely sure of the merits myself:
We might not want to update it when sending returns an error. It's fine now, as we abort index sending entirely on any error, so the updated variable wont be used. However it opens a window for an easy to introduce future bug if we ever decide to handle (e.g. retry) on error.

That's valid, but I think can be left off at the moment. In principle the batcher could/should panic with a bug message if the flush function is called again after an error is returned, as given how we're using it that's clearly unsafe as we don't know the status of the previous flush...

Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence to be at the value before the send attempt that produced the error.

imsodin · 2024-08-24T11:01:39Z

lib/model/indexhandler.go

+	// Verify that the previous sequence number matches what we expected
+	if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) {
+		l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence)
+		s.evLogger.Log(events.Failure, "index update with unexpected sequence")


We could log with ur.FailureReport as data, and attach the mismatching sequences as extra data. Would be nice to see if there's any pattern (off by one, off by multiple of default patch size, ...), while still keeping the "uniform" event name for grouping in sentry.

imsodin · 2024-08-24T11:11:28Z

lib/model/indexhandler.go

+
+	// Verify that the previous sequence number matches what we expected
+	if prevSequence > 0 && prevSequence != fset.Sequence(deviceID) {
+		l.Warnf("Bug: device %v folder %s sent index update with sequence %d while we expected %d", deviceID.Short(), s.folder, fset.Sequence(deviceID), s.prevSequence)


All these warnings will become banners in the UI, and there's potential for a lot of them - both if this uncovers a (the! :) ) bug or if we get something about the debugging wrong. I assume the intention is for the user to bring such reports to the forum, so we get someone to debug with in addition to just the reports, that we already get from failure reporting. In that case lets maybe do just a single warning per failure type and device on first occurrence, and then info log to avoid plastering the GUI?

imsodin · 2024-08-24T11:20:20Z

lib/model/indexhandler.go

+		// Verify index in relation to the claimed sequence boundaries
+		if fs[i].Sequence < prevSequence {
+			l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence)
+			s.evLogger.Log(events.Failure, "file with sequence before prevSequence")
+		}
+		if lastSequence > 0 && fs[i].Sequence > lastSequence {
+			l.Warnf("Bug: device %v folder %s sent file with sequence %d outside of claimed range %d-%d", deviceID.Short(), s.folder, fs[i].Sequence, prevSequence, lastSequence)
+			s.evLogger.Log(events.Failure, "file with sequence after lastSequence")
+		}
+		if i > 0 && fs[i].Sequence <= fs[i-1].Sequence {
+			l.Warnf("Bug: device %v folder %s sent non-increasing sequence %d after %d", deviceID.Short(), s.folder, fs[i].Sequence, fs[i-1].Sequence)
+			s.evLogger.Log(events.Failure, "non-increasing sequence")
+		}


These look largely "symmetric" to existing checks in sendIndexTo. Lets use the same pattern there as well:
That panic behind shouldDebug could also become an unconditional check with failure-logging intead of panicking. And also sending a failure report when detecting a non-increasing sequence there.

calmh · 2024-08-25T07:37:56Z

I added some things and extra data, made the warning a once-only thing.

imsodin · 2024-08-25T08:55:48Z

lib/model/indexhandler.go

+		var lastSequence int64
+		if len(fs) > 0 {
+			lastSequence = fs[len(fs)-1].Sequence
+		}
+		defer func() { sentPrevSequence = lastSequence }()


When len(fs) == 0 we don't want to reset sentPrevSequence to 0. This will never happen, as the FileInfoBatch already checks and aborts if there's nothing to send. Still would be nicer if this function itself was correct, e.g. by immediately returning when fs is empty.

imsodin · 2024-08-25T09:03:39Z

lib/model/indexhandler.go

 		}
-		return s.conn.IndexUpdate(ctx, &protocol.IndexUpdate{Folder: s.folder, Files: fs})
+		defer func() { sentPrevSequence = lastSequence }()


Generally fully agree that it's not strictly necessary right now.
The batch reuse on error concern seems to be orthogonal to my concern: Even if the batch is reset/otherwise not reused, sentPrevSequence is not tied to that. If on error one resets the batch and retry, one needs sentPrevSequence to be at the value before the send attempt that produced the error.

imsodin

Nice :)

Just one more optional nit, thus already approved.

calmh · 2024-08-25T09:16:57Z

OK dialling the paranoia to 11, please recheck that logic...

AudriusButkevicius · 2024-08-29T21:15:19Z

lib/model/indexhandler.go

+				"seenSeq": fs[i].Sequence,
+				"atIndex": i,
+			})
+		}


Isn't this always going to fire? Next is always bigger than last, no? Or is last in this case already adjusted?

Confusion of English terminology, sorry. "last" as "the supposed last sequence number of the batch we're currently processing".

AudriusButkevicius · 2024-08-29T21:16:25Z

lib/model/indexhandler.go

+		Extra:       extraStrs,
+	})
+}
+


Could have logged to sentry?

Indeed it does, via the failure reporting mechanism 👍

AudriusButkevicius

Other than the silly questions, I suspect this might not fire, as the issue is related to folderr/app restarts

* main: gui, man, authors: Update docs, translations, and contributors gui: Actually filter scope ID out of IPv6 address when using Remote GUI (ref syncthing#8084) (syncthing#9688) Revert "lib/fs: Put the caseFS as the outermost layer (syncthing#9648)" lib/api: Correct ordering of Accept-Language codes by weight (fixes syncthing#9670) (syncthing#9671) gui, man, authors: Update docs, translations, and contributors gui: Fix incorrect UI language auto detection (fixes syncthing#9668) (syncthing#9669) lib/model, lib/protocol: Index sending/receiving debugging (syncthing#9657) lib/upgrade: Send OS version header to upgrade server (syncthing#9663) lib/upgrade: Send OS version header to upgrade server (syncthing#9663) gui, man, authors: Update docs, translations, and contributors

calmh mentioned this pull request Aug 20, 2024

lib/protocol: Further interface refactor #9396

Merged

acolomb reviewed Aug 21, 2024

View reviewed changes

lib/model/indexhandler.go Outdated Show resolved Hide resolved

lib/model/indexhandler.go Outdated Show resolved Hide resolved

lib/model, lib/protocol: Index sending/receiving debugging

f72095e

calmh force-pushed the jb/indexnos branch from 7b91d32 to f72095e Compare August 24, 2024 10:48

calmh changed the base branch from jb/indexrefactor2 to main August 24, 2024 10:48

calmh changed the title ~~wip: Index sending/receiving debugging~~ lib/model, lib/protocol: Index sending/receiving debugging Aug 24, 2024

calmh requested a review from imsodin August 24, 2024 10:57

imsodin reviewed Aug 24, 2024

View reviewed changes

wip

1a5ff08

calmh added 5 commits August 25, 2024 09:40

wip

440e17b

wip

6e462a3

wip

b1f8e30

wip

26c04c2

wip

aea6362

imsodin reviewed Aug 25, 2024

View reviewed changes

imsodin approved these changes Aug 25, 2024

View reviewed changes

wip

2b37c2a

wip

942ab13

calmh merged commit 42e677c into syncthing:main Aug 28, 2024
20 of 21 checks passed

AudriusButkevicius reviewed Aug 29, 2024

View reviewed changes

calmh added this to the v1.27.13 milestone Sep 11, 2024

calmh deleted the jb/indexnos branch May 26, 2025 13:42

Uh oh!

lib/model, lib/protocol: Index sending/receiving debugging #9657

lib/model, lib/protocol: Index sending/receiving debugging #9657

Uh oh!

Conversation

calmh commented Aug 20, 2024

Uh oh!

acolomb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

imsodin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calmh commented Aug 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imsodin left a comment

Choose a reason for hiding this comment

Uh oh!

calmh commented Aug 25, 2024

Uh oh!

Uh oh!

AudriusButkevicius Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AudriusButkevicius left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

imsodin left a comment •

edited

Loading

AudriusButkevicius Aug 29, 2024 •

edited

Loading