-
Notifications
You must be signed in to change notification settings - Fork 2.6k
compressed vector serialization fixes #16648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compressed vector serialization fixes #16648
Conversation
assert was violated by the new compressed vector serialization (WIP: investigating why this was not detected during testing..) fix: - use different keys (200,201,202,203) for the properties of compressed vectors - remove the -optional- vector_type serialization for flat vectors (because it then would repeat for constant and dictionary vectors)
…r-serialization-bugfix
…r-serialization-bugfix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I think this is still not quite correct though:
src/common/types/vector.cpp
Outdated
@@ -1352,7 +1351,7 @@ void Vector::Serialize(Serializer &serializer, idx_t count, bool compressed_seri | |||
void Vector::Deserialize(Deserializer &deserializer, idx_t count) { | |||
auto &logical_type = GetType(); | |||
const auto vtype = // older versions that only supported flat vectors did not serialize vector_type, | |||
deserializer.ReadPropertyWithExplicitDefault<VectorType>(99, "vector_type", VectorType::FLAT_VECTOR); | |||
deserializer.ReadPropertyWithExplicitDefault<VectorType>(200, "vector_type", VectorType::FLAT_VECTOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For flat vectors we are now reading 200
followed by reading 100
- this is incorrect (these should always be incremental).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange: had not run into any assert on that
but I can bring the numbering down to e.g. 90
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, renumbered from 90 on. It uses shouldSerialize(5), which would be the serialization compatibility version v1.3.0 -- which does not exist yet (therefore my bug did not do damage, pfweh).
Because I considered myself not to be in the position to add that serialization mode at the time, I did not do that. If you want me to, I could also make that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also adding compatibility version v1.3.0 here
Along with some changes to the benchmark runner and the unit tester to support specifying the storage version for the db - as we default to storage compatibility version 0.10.2 I believe.
Perhaps we should move those changes to a separate PR as the DICT_FSST
stuff is not quite ready to be merged yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@peterboncz / @Tishj : #16800 is now on main, and added 1.3.0 version, so that could now be used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…wer than the other keys
and one indentation level less for the code (is a win ;-)
Three changes: * cherry-picking #16799 already PR'd against `v1.2-histrionicus` * add storage_version 66 alias `v1.3.0` * change the results of `SELECT tags['storage_version'] FROM duckdb_databases()` so that only lower bound is returned this should make so that result is stable across duckdb versions, and as a side effect tests will also be valid across duckdb version unsure if adding a table function with the whole mapping might be of use, that would allow via SQL to reconstruct the full bounds. This PR do not introduce any change that is actually backward incompatible and requiring a version bump, but there are in-flight PRs such as #15637 or #16648 that can then make use of this
Thanks! |
compressed vector serialization fixes (duckdb/duckdb#16648)
compressed vector serialization fixes (duckdb/duckdb#16648)
compressed vector serialization fixes (duckdb/duckdb#16648)
compressed vector serialization fixes (duckdb/duckdb#16648)
compressed vector serialization fixes (duckdb/duckdb#16648)
compressed vector serialization fixes (duckdb/duckdb#16648)
binary serializer asserts tags and keys to be unique and this assert was violated by the new compressed vector serialization
many apologies for assuming that this had been tested by DataChunk::Verify() -- it was not because test runs still use v0.10.2 as standard serialization compatibility.
This came out in integration testing in MotherDuck, where we have now advanced to v1.2.0 (note that this would still not trigger the new code as that would happen in v1.3.0 -- but in the MotherDuck cherry-pick of this PR we have switched it off by default but can enable it with a extension SET variable -- for testing purpose).
fix:
In addition, I enabled the use of compressed vectors in composed types array, list and struct.