Skip to content

Inserting many identical UUIDs triggers underflow in HUGEINT addition #9193

@smonkewitz

Description

@smonkewitz

What happens?

Inserting many identical UUIDs into an on-disk table triggers an underflow in HUGEINT addition.

Notes:

  • I can trigger this issue on main and v0.9.0
  • the issue is not present with in-memory databases
  • decreasing the number of rows inserted to 100000 also avoids the issue.
  • I cannot reproduce on v0.8.1.

Based on the above, and eye-balling the stack trace (see below), my guess is that it's the compression attempt on the UUID column with INT128 PhysicalType that results in the underflow.

To Reproduce

CLI (main):

➜  duckdb git:(main) build/release/duckdb /tmp/test.duckdb
v0.9.1-dev86 c0dfff9198
Enter ".help" for usage hints.
D create table test(x uuid);
D insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1, 1000000);
Error: Out of Range Error: Underflow in HUGEINT addition

Python client (v0.9.0, inside LLDB, with backtrace):

sudo lldb /opt/homebrew/bin/python3
(lldb) target create "/opt/homebrew/bin/python3"
Current executable set to '/opt/homebrew/bin/python3' (arm64).
(lldb) breakpoint set -E c++
Breakpoint 1: no locations (pending).
(lldb) run
Process 5125 launched: '/opt/homebrew/bin/python3' (arm64)
2 locations added to breakpoint 1
Process 5125 stopped
* thread #2, stop reason = exec
    frame #0: 0x0000000100014a40 dyld`_dyld_start
dyld`:
->  0x100014a40 <+0>:  mov    x0, sp
    0x100014a44 <+4>:  and    sp, x0, #0xfffffffffffffff0
    0x100014a48 <+8>:  mov    x29, #0x0
    0x100014a4c <+12>: mov    x30, #0x0
Target 0: (Python) stopped.
(lldb) c
Process 5125 resuming
Python 3.11.4 (main, Jun 20 2023, 17:23:00) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import duckdb
>>> conn = duckdb.connect("/tmp/test.duckdb")
>>> conn.sql("create table test(x uuid)")
>>> conn.sql("insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1,1000000)")
Process 5125 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000189a72c54 libc++abi.dylib`__cxa_throw
libc++abi.dylib`:
->  0x189a72c54 <+0>:  pacibsp 
    0x189a72c58 <+4>:  stp    x24, x23, [sp, #-0x40]!
    0x189a72c5c <+8>:  stp    x22, x21, [sp, #0x10]
    0x189a72c60 <+12>: stp    x20, x19, [sp, #0x20]
Target 0: (Python) stopped.
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000189a72c54 libc++abi.dylib`__cxa_throw
    frame #1: 0x0000000105d73808 duckdb.cpython-311-darwin.so`duckdb::Hugeint::Subtract(duckdb::hugeint_t, duckdb::hugeint_t) + 212
    frame #2: 0x000000010624a1b4 duckdb.cpython-311-darwin.so`duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::CalculateDeltaStats() + 204
    frame #3: 0x0000000106249e6c duckdb.cpython-311-darwin.so`bool duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::Flush<duckdb::EmptyBitpackingWriter>() + 128
    frame #4: 0x0000000106249d40 duckdb.cpython-311-darwin.so`bool duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::Update<duckdb::EmptyBitpackingWriter>(duckdb::hugeint_t, bool) + 276
    frame #5: 0x000000010624926c duckdb.cpython-311-darwin.so`bool duckdb::BitpackingAnalyze<duckdb::hugeint_t>(duckdb::AnalyzeState&, duckdb::Vector&, unsigned long long) + 152
    frame #6: 0x00000001062d80ac duckdb.cpython-311-darwin.so`std::__1::__function::__func<duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&)::$_0, std::__1::allocator<duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&)::$_0>, void (duckdb::Vector&, unsigned long long)>::operator()(duckdb::Vector&, unsigned long long&&) + 164
    frame #7: 0x00000001062a2934 duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::ScanSegments(std::__1::function<void (duckdb::Vector&, unsigned long long)> const&) + 416
    frame #8: 0x00000001062a3040 duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&) + 400
    frame #9: 0x00000001062a368c duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::WriteToDisk() + 184
    frame #10: 0x00000001062a89fc duckdb.cpython-311-darwin.so`duckdb::ColumnData::Checkpoint(duckdb::RowGroup&, duckdb::PartialBlockManager&, duckdb::ColumnCheckpointInfo&) + 500
    frame #11: 0x00000001062ca3b4 duckdb.cpython-311-darwin.so`duckdb::StandardColumnData::Checkpoint(duckdb::RowGroup&, duckdb::PartialBlockManager&, duckdb::ColumnCheckpointInfo&) + 80
    frame #12: 0x00000001062bd27c duckdb.cpython-311-darwin.so`duckdb::RowGroup::WriteToDisk(duckdb::PartialBlockManager&, duckdb::vector<duckdb::CompressionType, true> const&) + 208
    frame #13: 0x00000001062f8148 duckdb.cpython-311-darwin.so`duckdb::OptimisticDataWriter::FlushToDisk(duckdb::RowGroup*) + 580
    frame #14: 0x00000001062e6778 duckdb.cpython-311-darwin.so`duckdb::LocalStorage::Append(duckdb::LocalAppendState&, duckdb::DataChunk&) + 164
    frame #15: 0x00000001060e6aa0 duckdb.cpython-311-darwin.so`duckdb::PhysicalInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const + 472
    frame #16: 0x0000000106201300 duckdb.cpython-311-darwin.so`duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, unsigned long long) + 268
    frame #17: 0x000000010620154c duckdb.cpython-311-darwin.so`duckdb::PipelineExecutor::Execute(unsigned long long) + 272
    frame #18: 0x000000010620800c duckdb.cpython-311-darwin.so`duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) + 224
    frame #19: 0x00000001061f8490 duckdb.cpython-311-darwin.so`duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) + 36
    frame #20: 0x00000001061fcad8 duckdb.cpython-311-darwin.so`duckdb::Executor::ExecuteTask() + 184
    frame #21: 0x0000000106194e04 duckdb.cpython-311-darwin.so`duckdb::ClientContext::ExecuteTaskInternal(duckdb::ClientContextLock&, duckdb::PendingQueryResult&) + 56
    frame #22: 0x00000001061a9884 duckdb.cpython-311-darwin.so`duckdb::PendingQueryResult::ExecuteTask() + 56
    frame #23: 0x00000001053ce398 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::CompletePendingQuery(pending_query=0x0000600003b2b2c0) at pyconnection.cpp:383:36 [opt]
    frame #24: 0x00000001053cf030 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::ExecuteInternal(this=<unavailable>, query=<unavailable>, params=<unavailable>, many=false) at pyconnection.cpp:500:10 [opt]
    frame #25: 0x00000001053d6ac8 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::RunQuery(this=0x0000600003e08018, query="insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1,1000000)", alias="unnamed_relation_13b22b457aeb6909", params=0x000000016fdfe9f8) at pyconnection.cpp:923:13 [opt]

OS:

macOS 13.6

DuckDB Version:

0.9.0, main

DuckDB Client:

Python, CLI

Full Name:

Serge Monkewitz

Affiliation:

Arista Networks

Have you tried this on the latest main branch?

I have tested with a main build

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • Yes, I have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions