-
Notifications
You must be signed in to change notification settings - Fork 12
Initial attempt to integrate with Vault with integrated storage #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This actually looks super close.
I know it's WIP and you have other things today but I thought I'd get some early comments in in case they are useful today. Mostly are just about updating comments!
The biggest actual change is that I don't think the test case is testing quite the right thing (i.e. I'm pretty sure the one here would pass even if you reverted all the other changes - might be worth trying to be sure the test is exercising the problem properly!)
It's only a matter of changing a few indexes though to be right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The atomic part I think is the last thing we need to get right (and test) before this is good to go! Thanks for all the work on this!
segment/writer.go
Outdated
offsets = append(offsets, 0) | ||
} | ||
|
||
atomic.StoreUint64(&w.info.MinIndex, e.Index) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see this is an atomic store now, but I don't see atomic Loads for the places that read this (e.g. OffsetForFrame above on line 356).
Also didn't see the update to the concurrent test in writer_test.go
that would help confirm when the racey behaviour is fixed.
Finally, because we are just updating this in place in a struct that is used for other purposes, there is a chance that in the future this struct could be modified so this field is no longer 64 bit aligns (it does happen to be right now but there's nothing to stop the struct being changed in the future since the requirement is undocumented). I think we either should:
- Store a separate field in the
writer {}
state for this and use it here and inOffsetForFrame
above. We'd probably want to document why it might be different toinfo.MinIndex
. Downside is that its' not obvious that info.MinIndex might not be correct any more. - Update the docs on the
SegmentInfo
struct fields to point out that we use them internally and rely on 64 bit alignment so not to change that. This is kinda gross as they are in a separate package, but at least reduces the risk of introducing a breaking change later accidentally. - Make the whole
info
field anatomic.Value
and copy-and-swap it out here. This is probably the cleanest overall?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please check 483142e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on the test next!
Co-authored-by: Paul Banks <banks@banksco.de>
|
||
// SegmentInfo returns writer's atomic info field | ||
func (w *Writer) SegmentInfo() types.SegmentInfo { | ||
return w.info.Load().(types.SegmentInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not check for the assertion error here as the only reader of the field is types.SegmentInfo
. Please let me know if you feel I should add that error check.
_, err = w.GetLog(1) | ||
require.ErrorIs(t, err, types.ErrNotFound) | ||
|
||
// Append to writer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Append to writer | |
// Append to writer. Note the first Index written is greater than the BaseIndex of 1, simulating a node that starts appending after a snapshot. |
err = w.Append([]types.LogEntry{{Index: 5, Data: []byte("one")}}) | ||
require.NoError(t, err) | ||
|
||
_, err = w.GetLog(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_, err = w.GetLog(0) | |
_, err = w.GetLog(1) |
Zero is never a valid index for raft logs. We would have expected append to end up at 1 if it was for some reason not correctly handling the write of a later index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. I think we could merge it as is though in reviewing this one last time I think there might be a few cases we are still missing here.
What happens when a tail that has started from not-zero in this way eventually gets sealed? This change should ensure that we will end up writing out an index frame that includes N zeros at the start which should work but only if the reader correctly check MinIndex before looking up in that index. (Need to write a test for that).
Edit: a variant of this that definitely doesn't work is that if the writer recovers it's state from disk after a shutdown or crash, it currently won't know whether or not to start building the index from offset 0 or offset BaseIndex - MinIndex
since the MinIndex from the meta store could indicate a logical truncation of data that was appended, or that a prefix was skipped and so will not be read back. In theory we could recover that information by looking at the indexes of the entries but that breaks the current abstractions since the segment layer doesn't actually know how the records are encoded currently. This needs more thought.
The other issue that I feel like we thought about but i'm not forgetting why it's OK: in a real cluster the current index at the point a snapshot is restored might be arbitrarily large. If the first Index written to a newly truncated log is say 1,000,000
, this fix will result in adding 4MiB of zeros to the in-memory index and then serializing them into the final file...
For raft indexes we typically see that might not be too bad, but it's certainly wasteful and at least in theory if you 500 writes/second for a year and then restore a snapshot on the cluster you would end up with an index of ~15Billion which would mean 58GiB of zeros in the index in RAM and on disk....
This is all making me wonder if we need an alternative approach to this case where we actually roll to a new segment with the correct index instead.
I'll try that out today.
Given that #24 merged, can this PR be closed? |
I think the answer to my question is yes, so I'm going to close it, but please reopen if I'm mistaken. |
This PR fixes a panic in finding a frame offset where the offset list does not contain the entry index.
In our hackweek project, we attempted to integrate raft-wal with Vault with integrated storage. As part of the integration, we found that snapshot restore is broken as raft with boltDB as its stable store allows for gaps in the logStore. However, raft-wal is a monotonically increasing gapless store. So, the integration test triggered the above panic.