Initial attempt to integrate with Vault with integrated storage #23

hghaf099 · 2023-03-07T22:49:31Z

This PR fixes a panic in finding a frame offset where the offset list does not contain the entry index.

In our hackweek project, we attempted to integrate raft-wal with Vault with integrated storage. As part of the integration, we found that snapshot restore is broken as raft with boltDB as its stable store allows for gaps in the logStore. However, raft-wal is a monotonically increasing gapless store. So, the integration test triggered the above panic.

banks

Nice! This actually looks super close.

I know it's WIP and you have other things today but I thought I'd get some early comments in in case they are useful today. Mostly are just about updating comments!

The biggest actual change is that I don't think the test case is testing quite the right thing (i.e. I'm pretty sure the one here would pass even if you reverted all the other changes - might be worth trying to be sure the test is exercising the problem properly!)

It's only a matter of changing a few indexes though to be right!

segment/filer_test.go

segment/writer.go

types/types.go

segment/filer_test.go

banks

The atomic part I think is the last thing we need to get right (and test) before this is good to go! Thanks for all the work on this!

types/types.go

banks · 2023-03-13T14:51:51Z

segment/writer.go

+			offsets = append(offsets, 0)
+		}
+
+		atomic.StoreUint64(&w.info.MinIndex, e.Index)


I see this is an atomic store now, but I don't see atomic Loads for the places that read this (e.g. OffsetForFrame above on line 356).

Also didn't see the update to the concurrent test in writer_test.go that would help confirm when the racey behaviour is fixed.

Finally, because we are just updating this in place in a struct that is used for other purposes, there is a chance that in the future this struct could be modified so this field is no longer 64 bit aligns (it does happen to be right now but there's nothing to stop the struct being changed in the future since the requirement is undocumented). I think we either should:

Store a separate field in the writer {} state for this and use it here and in OffsetForFrame above. We'd probably want to document why it might be different to info.MinIndex. Downside is that its' not obvious that info.MinIndex might not be correct any more.

Update the docs on the SegmentInfo struct fields to point out that we use them internally and rely on 64 bit alignment so not to change that. This is kinda gross as they are in a separate package, but at least reduces the risk of introducing a breaking change later accidentally.

Make the whole info field an atomic.Value and copy-and-swap it out here. This is probably the cleanest overall?

please check 483142e

I will work on the test next!

Co-authored-by: Paul Banks <banks@banksco.de>

hghaf099 · 2023-03-13T18:20:34Z

segment/writer.go

+
+// SegmentInfo returns writer's atomic info field
+func (w *Writer) SegmentInfo() types.SegmentInfo {
+	return w.info.Load().(types.SegmentInfo)


I did not check for the assertion error here as the only reader of the field is types.SegmentInfo. Please let me know if you feel I should add that error check.

…ot racy

banks · 2023-03-14T11:17:57Z

segment/filer_test.go

+	_, err = w.GetLog(1)
+	require.ErrorIs(t, err, types.ErrNotFound)
+
+	// Append to writer


Suggested change

// Append to writer

// Append to writer. Note the first Index written is greater than the BaseIndex of 1, simulating a node that starts appending after a snapshot.

banks · 2023-03-14T11:19:10Z

segment/filer_test.go

+	err = w.Append([]types.LogEntry{{Index: 5, Data: []byte("one")}})
+	require.NoError(t, err)
+
+	_, err = w.GetLog(0)


Suggested change

_, err = w.GetLog(0)

_, err = w.GetLog(1)

Zero is never a valid index for raft logs. We would have expected append to end up at 1 if it was for some reason not correctly handling the write of a later index.

banks

This is looking good. I think we could merge it as is though in reviewing this one last time I think there might be a few cases we are still missing here.

What happens when a tail that has started from not-zero in this way eventually gets sealed? This change should ensure that we will end up writing out an index frame that includes N zeros at the start which should work but only if the reader correctly check MinIndex before looking up in that index. (Need to write a test for that).

Edit: a variant of this that definitely doesn't work is that if the writer recovers it's state from disk after a shutdown or crash, it currently won't know whether or not to start building the index from offset 0 or offset BaseIndex - MinIndex since the MinIndex from the meta store could indicate a logical truncation of data that was appended, or that a prefix was skipped and so will not be read back. In theory we could recover that information by looking at the indexes of the entries but that breaks the current abstractions since the segment layer doesn't actually know how the records are encoded currently. This needs more thought.

The other issue that I feel like we thought about but i'm not forgetting why it's OK: in a real cluster the current index at the point a snapshot is restored might be arbitrarily large. If the first Index written to a newly truncated log is say 1,000,000, this fix will result in adding 4MiB of zeros to the in-memory index and then serializing them into the final file...

For raft indexes we typically see that might not be too bad, but it's certainly wasteful and at least in theory if you 500 writes/second for a year and then restore a snapshot on the cluster you would end up with an index of ~15Billion which would mean 58GiB of zeros in the index in RAM and on disk....

This is all making me wonder if we need an alternative approach to this case where we actually roll to a new segment with the correct index instead.

I'll try that out today.

raskchanky · 2023-03-21T16:49:51Z

Given that #24 merged, can this PR be closed?

raskchanky · 2023-03-22T15:49:41Z

I think the answer to my question is yes, so I'm going to close it, but please reopen if I'm mistaken.

hghaf099 added 4 commits March 7, 2023 14:46

WIP initial attempt to integrate with Vault with integrated storage

f4ae2fc

adding IsMonotonic func

e75a489

atomic access to info.MinIndex

32a9fa5

adding a test to read before and after of the first index

83e833a

hghaf099 requested a review from banks March 10, 2023 17:00

banks reviewed Mar 10, 2023

View reviewed changes

adding another test, addressing feedback

522b924

hghaf099 commented Mar 10, 2023

View reviewed changes

segment/filer_test.go Show resolved Hide resolved

hghaf099 changed the title ~~WIP initial attempt to integrate with Vault with integrated storage~~ Initial attempt to integrate with Vault with integrated storage Mar 10, 2023

hghaf099 requested a review from banks March 10, 2023 21:24

hghaf099 marked this pull request as ready for review March 10, 2023 21:24

hghaf099 requested review from mpalmi and raskchanky March 10, 2023 21:29

minor fix

467e7d2

banks suggested changes Mar 13, 2023

View reviewed changes

hghaf099 and others added 2 commits March 13, 2023 09:51

Update types/types.go

ae9b5b1

Co-authored-by: Paul Banks <banks@banksco.de>

atomic writer info

483142e

hghaf099 commented Mar 13, 2023

View reviewed changes

modify the writer race test to make sure accessing info.MinIndex is n…

077b3e2

…ot racy

hghaf099 requested a review from banks March 13, 2023 21:59

banks reviewed Mar 14, 2023

View reviewed changes

banks mentioned this pull request Mar 14, 2023

Fix a bug where the first log appended is not Idx=1 #24

Merged

mpalmi removed their request for review March 22, 2023 13:10

raskchanky closed this Mar 22, 2023

raskchanky deleted the hackweek-raft-vault-integ branch March 22, 2023 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial attempt to integrate with Vault with integrated storage #23

Initial attempt to integrate with Vault with integrated storage #23

Uh oh!

hghaf099 commented Mar 7, 2023 •

edited

Loading

Uh oh!

banks left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

banks left a comment

Uh oh!

Uh oh!

banks Mar 13, 2023

Uh oh!

hghaf099 Mar 13, 2023

Uh oh!

hghaf099 Mar 13, 2023

Uh oh!

hghaf099 Mar 13, 2023

Uh oh!

banks Mar 14, 2023

Uh oh!

banks Mar 14, 2023

Uh oh!

banks left a comment •

edited

Loading

Uh oh!

raskchanky commented Mar 21, 2023

Uh oh!

raskchanky commented Mar 22, 2023

Uh oh!

Uh oh!

	// Append to writer
	// Append to writer. Note the first Index written is greater than the BaseIndex of 1, simulating a node that starts appending after a snapshot.

Initial attempt to integrate with Vault with integrated storage #23

Initial attempt to integrate with Vault with integrated storage #23

Uh oh!

Conversation

hghaf099 commented Mar 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

banks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

banks left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

banks Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

hghaf099 Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

hghaf099 Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

hghaf099 Mar 13, 2023

Choose a reason for hiding this comment

Uh oh!

banks Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

banks Mar 14, 2023

Choose a reason for hiding this comment

Uh oh!

banks left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raskchanky commented Mar 21, 2023

Uh oh!

raskchanky commented Mar 22, 2023

Uh oh!

Uh oh!

hghaf099 commented Mar 7, 2023 •

edited

Loading

banks left a comment •

edited

Loading