-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happens?
When converting a DuckDB result set to a Julia DataFrame with the Julia library, there is a bounds error that seems to be from trying to read the validity vector beyond VECTOR_SIZE bits. This happens when the summed length of a list/vector column is greater than VECTOR_SIZE = 2048. Here's the exception you might get:
ERROR: LoadError: BoundsError: attempt to access 32-element Vector{UInt64} at index [33]
Stacktrace:
[1] getindex
@ ./essentials.jl:13 [inlined]
[2] getindex
@ ./abstractarray.jl:1297 [inlined]
[3] isvalid
@ ~/sources/duckdb/tools/juliapkg/src/validity_mask.jl:33 [inlined]
[4] convert_vector(column_data::DuckDB.ColumnConversionData, vector::DuckDB.Vec, size::UInt64, convert_func::typeof(DuckDB.nop_convert), result::Vector{Union{Missing, Int32}}, position::Int64, all_valid::Bool, #unused#::Type{Int32}, #unused#::Type{Int32})
@ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:156
[5] convert_vector_list(column_data::DuckDB.ColumnConversionData, vector::DuckDB.Vec, size::UInt64, convert_func::Function, result::Vector{Union{Missing, Vector{Union{Missing, Int32}}}}, position::Int64, all_valid::Bool, #unused#::Type{DuckDB.duckdb_list_entry_t}, #unused#::Type{Vector{Union{Missing, Int32}}})
@ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:209
[6] convert_column_loop(column_data::DuckDB.ColumnConversionData, convert_func::Function, #unused#::Type{DuckDB.duckdb_list_entry_t}, #unused#::Type{Vector{Union{Missing, Int32}}}, convert_vector_func::typeof(DuckDB.convert_vector_list))
@ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:402
[7] convert_column(column_data::DuckDB.ColumnConversionData)
@ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:553
[8] toDataFrame(q::DuckDB.QueryResult)
@ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:579
[9] show
@ ~/sources/duckdb/tools/juliapkg/src/result.jl:845 [inlined]
[10] print(io::Base.TTY, x::DuckDB.QueryResult)
@ Base ./strings/io.jl:35
[11] print(::Base.TTY, ::DuckDB.QueryResult, ::String)
@ Base ./strings/io.jl:46
[12] println(io::Base.TTY, xs::DuckDB.QueryResult)
@ Base ./strings/io.jl:75
[13] println(xs::DuckDB.QueryResult)
@ Base ./coreio.jl:4
...
I took a quick look but couldn't really figure it out. I tried seeing if I could modify get_validity
to read the validity vector beyond 2048, but this caused a segmentation fault. Is the rest of the list in another chunk somewhere? I see one "list" and one "child list". Are there more children hiding somewhere with the rest of the entries?
I guess this might be rather a quick fix for someone else, but I've dug into it a little bit so I would also happy to receive some direction to put together a PR myself in case this would help.
To Reproduce
Main reproducing script:
using DuckDB
using DBInterface: connect, execute
function main()
con = connect(DuckDB.DB, ":memory:")
execute(con, "CREATE TABLE list_table (int_list INT[]);")
execute(con, "INSERT INTO list_table VALUES (range(2049));")
df = execute(con, "SELECT * FROM list_table;")
println(df)
end
main()
Showing it is the sum of all lists:
using DuckDB
using DBInterface: connect, execute
function main()
# This causes and error!
con = connect(DuckDB.DB, ":memory:")
df = execute(con, "SELECT * FROM range(2049)")
println(df)
# This is fine
execute(con, "CREATE TABLE list_table (int_list INT[]);")
execute(con, "INSERT INTO list_table VALUES (range(1024));")
execute(con, "INSERT INTO list_table VALUES (range(1025));")
df = execute(con, "SELECT * FROM list_table LIMIT 1;")
println(df)
df = execute(con, "SELECT * FROM list_table LIMIT 1 OFFSET 1;")
println(df)
# But this is not -- another error!
df = execute(con, "SELECT * FROM list_table;")
println(df)
end
main()
OS:
Linux x64
DuckDB Version:
v0.7.0 and master
DuckDB Client:
Julia
Full Name:
Frankie Robertson
Affiliation:
University of Jyväskylä
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree