-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Adds Julia support for scalar UDFs #14024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+283
−1
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Mytherin
reviewed
Sep 19, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM - one comment
Thanks! |
github-actions bot
pushed a commit
to duckdb/duckdb-r
that referenced
this pull request
Sep 27, 2024
Fix duckdb/duckdb#13993 - avoid disabling optimizers for SET VARIABLE (duckdb/duckdb#14028) Proper NULL handling in special json extraction functions (duckdb/duckdb#14032) Adds Julia support for scalar UDFs (duckdb/duckdb#14024)
github-actions bot
added a commit
to duckdb/duckdb-r
that referenced
this pull request
Sep 27, 2024
Fix duckdb/duckdb#13993 - avoid disabling optimizers for SET VARIABLE (duckdb/duckdb#14028) Proper NULL handling in special json extraction functions (duckdb/duckdb#14032) Adds Julia support for scalar UDFs (duckdb/duckdb#14024) Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Mytherin
added a commit
that referenced
this pull request
Jan 3, 2025
This PR improves the Julia support for user defined UDFs. It adds the `ScalarFunction` type and the `@create_scalar_function` macro that automatically generates a DuckDB compatible wrapper. See also discussion #13176 for details. Numeric types, dates, strings, missing values and exceptions are supported. Composite types are not yet implemented. ## Example Usage: ```julia using DuckDB, DataFrames f_add = (a, b) -> a + b db = DuckDB.DB() con = DuckDB.connect(db) # Create the scalar function # the second argument can be omitted if the function name is identical to a global symbol fun = DuckDB.@create_scalar_function f_add(a::Int, b::Int)::Int f_add DuckDB.register_scalar_function(con, fun) df = DataFrame(a = [1, 2, 3], b = [1, 2, 3]) DuckDB.register_table(con, df, "test1") result = DuckDB.execute(con, "SELECT f_add(a, b) as result FROM test1") |> DataFrame ``` ## Performance The performance of the auto-generated wrapper is comparable to pure DuckDB/Julia. I measured the elapsed time (in seconds) of adding 10 million numbers in a coarse benchmark: | | Int | Float | | ------------- | ----------- | ----------- | | DataFrames.jl | 0.092947083 | 0.090409625 | | DuckDB | 0.065306042 | 0.054156167 | | UDF | 0.078665125 | 0.080781 | ## Internals The scalar functions are tracked in a dictionary in the DuckDBHandle struct. Currently only registering scalar function is supported. The wrapper is generated via the macro and should be fully type stable. The wrapper is generated via the function `_udf_generate_wrapper()`. Because of limitations of the `@cfunction` macro in Julia, the wrapper needs to be globally accessible. I implemented this by introducing a global (constant) dictionary variable in DuckDB `_UDF_WRAPPER_CACHE` which is used to store the generated wrappers. This is defined in `_udf_register_wrapper()`. This is, in my opinion not ideal but works. I asked in Julia related forums for a better solution and will update the code, if something better is possible. This is an update to PR #14024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support and tests for defining scalar UDFs in julia and using them in duckdb. Thank you for an incredible tool.
Below is an example.