-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Fix extension size increase #14185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix extension size increase #14185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a note on improving the test (that currently is 1-1, while in this PR should still be significant, see https://github.com/duckdb/duckdb/actions/runs/11131102326/job/30931920185?pr=14185), and another one on the fact that GetAPIV0 is basically constexpr, so I would assume basically has no cost (and we expect you to use it)
Thanks!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's a "request changes", due to the test that is not super informative otherwise
Thanks for the PR! Great find. If I understand correctly the problem is that Given that this is the problem - I wonder if we can simplify this solution. Do we need to hand out pointers to the struct that resides in the |
@Mytherin do you mean like this? (7edb53d) I think the convention should be that extensions copy the struct over into some global variable anyway, at least thats what the macro's in the duckdb_extensions.h do anyway. However I could see how leaving the pointers valid could results in people incorrectly assuming they are always valid and lifetime issues arising. To resolve that I have made an extra copy that ensures the pointer to the function pointer struct that the extension sees during initialization will only be valid during the initialization forcing the extension to copy it. |
@Mytherin: I think this is ready Results (as per the added check) are:
|
1 sec, I will try to refactor it so that we don't need to store the struct in the databaseinstance. I think either through using a virtual method or storing the function pointer to the |
@Mytherin Tried making a virtual Went with the function pointer approach, which seems clean enough to me |
That's cleaner - but I wonder if this does not also work? In that case we would not need to store anything in the database instance. const duckdb_ext_api_v0 DatabaseInstance::GetExtensionAPIV0() {
return CreateAPIv0();
}
|
No that will get the size right back up where it was |
Alright, let's get this in as-is then. Thanks! |
Let's do a final double check for the binary sizes before merge. I checked locally, but better safe than sorry |
sizes seem all good:
|
Fix extension size increase (duckdb/duckdb#14185)
Fix extension size increase (duckdb/duckdb#14185) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
In v1.1.0, the addition of the C Extension API has caused the extension binaries to increase quite significantly:
The reason is that due to the way we build extensions, we statically link duckdb in its entirety into the extension binary, then rely on the linker being able to prune the unused parts. What this means is that its quite easy unknowingly link in significant parts of DuckDB causing the extension binary to significantly grow.
This is precisely what happened when someone (possibly me) wrote some code that produces a struct containing all C API function pointers. Since the C API touches a lot of things throughout DuckDB this prevented the linker from being able to prune much.
The solution is to generate the struct of function pointers centrally in the DatabaseInstance, then hand out pointers to those.
@Mytherin I think we have 2 options: either put this behind a lock and lazily initialize, or initialize this in
DatabaseInstance::Initialize
then pass const pointers, I went with the latter.Also I added a regression test for this that tests the size of
[json,parquet,tpch,tpcds]
. (theshold for check failure is currently at 20%).