Skip to content

Converting result set with column with cumulative array size of >2048 (= VECTOR_SIZE) to Julia DataFrame causes bounds error  #6924

@frankier

Description

@frankier

What happens?

When converting a DuckDB result set to a Julia DataFrame with the Julia library, there is a bounds error that seems to be from trying to read the validity vector beyond VECTOR_SIZE bits. This happens when the summed length of a list/vector column is greater than VECTOR_SIZE = 2048. Here's the exception you might get:

ERROR: LoadError: BoundsError: attempt to access 32-element Vector{UInt64} at index [33]
Stacktrace:
  [1] getindex
    @ ./essentials.jl:13 [inlined]
  [2] getindex
    @ ./abstractarray.jl:1297 [inlined]
  [3] isvalid
    @ ~/sources/duckdb/tools/juliapkg/src/validity_mask.jl:33 [inlined]
  [4] convert_vector(column_data::DuckDB.ColumnConversionData, vector::DuckDB.Vec, size::UInt64, convert_func::typeof(DuckDB.nop_convert), result::Vector{Union{Missing, Int32}}, position::Int64, all_valid::Bool, #unused#::Type{Int32}, #unused#::Type{Int32})
    @ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:156
  [5] convert_vector_list(column_data::DuckDB.ColumnConversionData, vector::DuckDB.Vec, size::UInt64, convert_func::Function, result::Vector{Union{Missing, Vector{Union{Missing, Int32}}}}, position::Int64, all_valid::Bool, #unused#::Type{DuckDB.duckdb_list_entry_t}, #unused#::Type{Vector{Union{Missing, Int32}}})
    @ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:209
  [6] convert_column_loop(column_data::DuckDB.ColumnConversionData, convert_func::Function, #unused#::Type{DuckDB.duckdb_list_entry_t}, #unused#::Type{Vector{Union{Missing, Int32}}}, convert_vector_func::typeof(DuckDB.convert_vector_list))
    @ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:402
  [7] convert_column(column_data::DuckDB.ColumnConversionData)
    @ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:553
  [8] toDataFrame(q::DuckDB.QueryResult)
    @ DuckDB ~/sources/duckdb/tools/juliapkg/src/result.jl:579
  [9] show
    @ ~/sources/duckdb/tools/juliapkg/src/result.jl:845 [inlined]
 [10] print(io::Base.TTY, x::DuckDB.QueryResult)
    @ Base ./strings/io.jl:35
 [11] print(::Base.TTY, ::DuckDB.QueryResult, ::String)
    @ Base ./strings/io.jl:46
 [12] println(io::Base.TTY, xs::DuckDB.QueryResult)
    @ Base ./strings/io.jl:75
 [13] println(xs::DuckDB.QueryResult)
    @ Base ./coreio.jl:4
...

I took a quick look but couldn't really figure it out. I tried seeing if I could modify get_validity to read the validity vector beyond 2048, but this caused a segmentation fault. Is the rest of the list in another chunk somewhere? I see one "list" and one "child list". Are there more children hiding somewhere with the rest of the entries?

I guess this might be rather a quick fix for someone else, but I've dug into it a little bit so I would also happy to receive some direction to put together a PR myself in case this would help.

To Reproduce

Main reproducing script:

using DuckDB
using DBInterface: connect, execute


function main()
    con = connect(DuckDB.DB, ":memory:")
    execute(con, "CREATE TABLE list_table (int_list INT[]);")
    execute(con, "INSERT INTO list_table VALUES (range(2049));")
    df = execute(con, "SELECT * FROM list_table;")
    println(df)
end

main()

Showing it is the sum of all lists:

using DuckDB
using DBInterface: connect, execute


function main()
    # This causes and error!
    con = connect(DuckDB.DB, ":memory:")
    df = execute(con, "SELECT * FROM range(2049)")
    println(df)

    # This is fine
    execute(con, "CREATE TABLE list_table (int_list INT[]);")
    execute(con, "INSERT INTO list_table VALUES (range(1024));")
    execute(con, "INSERT INTO list_table VALUES (range(1025));")
    df = execute(con, "SELECT * FROM list_table LIMIT 1;")
    println(df)
    df = execute(con, "SELECT * FROM list_table LIMIT 1 OFFSET 1;")
    println(df)

    # But this is not -- another error!
    df = execute(con, "SELECT * FROM list_table;")
    println(df)
end

main()

OS:

Linux x64

DuckDB Version:

v0.7.0 and master

DuckDB Client:

Julia

Full Name:

Frankie Robertson

Affiliation:

University of Jyväskylä

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions