-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed as not planned
Closed as not planned
Copy link
Labels
Description
What happens?
Inserting many identical UUIDs into an on-disk table triggers an underflow in HUGEINT addition.
Notes:
- I can trigger this issue on main and v0.9.0
- the issue is not present with in-memory databases
- decreasing the number of rows inserted to 100000 also avoids the issue.
- I cannot reproduce on v0.8.1.
Based on the above, and eye-balling the stack trace (see below), my guess is that it's the compression attempt on the UUID column with INT128
PhysicalType that results in the underflow.
To Reproduce
CLI (main):
➜ duckdb git:(main) build/release/duckdb /tmp/test.duckdb
v0.9.1-dev86 c0dfff9198
Enter ".help" for usage hints.
D create table test(x uuid);
D insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1, 1000000);
Error: Out of Range Error: Underflow in HUGEINT addition
Python client (v0.9.0, inside LLDB, with backtrace):
sudo lldb /opt/homebrew/bin/python3
(lldb) target create "/opt/homebrew/bin/python3"
Current executable set to '/opt/homebrew/bin/python3' (arm64).
(lldb) breakpoint set -E c++
Breakpoint 1: no locations (pending).
(lldb) run
Process 5125 launched: '/opt/homebrew/bin/python3' (arm64)
2 locations added to breakpoint 1
Process 5125 stopped
* thread #2, stop reason = exec
frame #0: 0x0000000100014a40 dyld`_dyld_start
dyld`:
-> 0x100014a40 <+0>: mov x0, sp
0x100014a44 <+4>: and sp, x0, #0xfffffffffffffff0
0x100014a48 <+8>: mov x29, #0x0
0x100014a4c <+12>: mov x30, #0x0
Target 0: (Python) stopped.
(lldb) c
Process 5125 resuming
Python 3.11.4 (main, Jun 20 2023, 17:23:00) [Clang 14.0.3 (clang-1403.0.22.14.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import duckdb
>>> conn = duckdb.connect("/tmp/test.duckdb")
>>> conn.sql("create table test(x uuid)")
>>> conn.sql("insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1,1000000)")
Process 5125 stopped
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000189a72c54 libc++abi.dylib`__cxa_throw
libc++abi.dylib`:
-> 0x189a72c54 <+0>: pacibsp
0x189a72c58 <+4>: stp x24, x23, [sp, #-0x40]!
0x189a72c5c <+8>: stp x22, x21, [sp, #0x10]
0x189a72c60 <+12>: stp x20, x19, [sp, #0x20]
Target 0: (Python) stopped.
(lldb) bt
* thread #2, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
* frame #0: 0x0000000189a72c54 libc++abi.dylib`__cxa_throw
frame #1: 0x0000000105d73808 duckdb.cpython-311-darwin.so`duckdb::Hugeint::Subtract(duckdb::hugeint_t, duckdb::hugeint_t) + 212
frame #2: 0x000000010624a1b4 duckdb.cpython-311-darwin.so`duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::CalculateDeltaStats() + 204
frame #3: 0x0000000106249e6c duckdb.cpython-311-darwin.so`bool duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::Flush<duckdb::EmptyBitpackingWriter>() + 128
frame #4: 0x0000000106249d40 duckdb.cpython-311-darwin.so`bool duckdb::BitpackingState<duckdb::hugeint_t, duckdb::hugeint_t>::Update<duckdb::EmptyBitpackingWriter>(duckdb::hugeint_t, bool) + 276
frame #5: 0x000000010624926c duckdb.cpython-311-darwin.so`bool duckdb::BitpackingAnalyze<duckdb::hugeint_t>(duckdb::AnalyzeState&, duckdb::Vector&, unsigned long long) + 152
frame #6: 0x00000001062d80ac duckdb.cpython-311-darwin.so`std::__1::__function::__func<duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&)::$_0, std::__1::allocator<duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&)::$_0>, void (duckdb::Vector&, unsigned long long)>::operator()(duckdb::Vector&, unsigned long long&&) + 164
frame #7: 0x00000001062a2934 duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::ScanSegments(std::__1::function<void (duckdb::Vector&, unsigned long long)> const&) + 416
frame #8: 0x00000001062a3040 duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::DetectBestCompressionMethod(unsigned long long&) + 400
frame #9: 0x00000001062a368c duckdb.cpython-311-darwin.so`duckdb::ColumnDataCheckpointer::WriteToDisk() + 184
frame #10: 0x00000001062a89fc duckdb.cpython-311-darwin.so`duckdb::ColumnData::Checkpoint(duckdb::RowGroup&, duckdb::PartialBlockManager&, duckdb::ColumnCheckpointInfo&) + 500
frame #11: 0x00000001062ca3b4 duckdb.cpython-311-darwin.so`duckdb::StandardColumnData::Checkpoint(duckdb::RowGroup&, duckdb::PartialBlockManager&, duckdb::ColumnCheckpointInfo&) + 80
frame #12: 0x00000001062bd27c duckdb.cpython-311-darwin.so`duckdb::RowGroup::WriteToDisk(duckdb::PartialBlockManager&, duckdb::vector<duckdb::CompressionType, true> const&) + 208
frame #13: 0x00000001062f8148 duckdb.cpython-311-darwin.so`duckdb::OptimisticDataWriter::FlushToDisk(duckdb::RowGroup*) + 580
frame #14: 0x00000001062e6778 duckdb.cpython-311-darwin.so`duckdb::LocalStorage::Append(duckdb::LocalAppendState&, duckdb::DataChunk&) + 164
frame #15: 0x00000001060e6aa0 duckdb.cpython-311-darwin.so`duckdb::PhysicalInsert::Sink(duckdb::ExecutionContext&, duckdb::DataChunk&, duckdb::OperatorSinkInput&) const + 472
frame #16: 0x0000000106201300 duckdb.cpython-311-darwin.so`duckdb::PipelineExecutor::ExecutePushInternal(duckdb::DataChunk&, unsigned long long) + 268
frame #17: 0x000000010620154c duckdb.cpython-311-darwin.so`duckdb::PipelineExecutor::Execute(unsigned long long) + 272
frame #18: 0x000000010620800c duckdb.cpython-311-darwin.so`duckdb::PipelineTask::ExecuteTask(duckdb::TaskExecutionMode) + 224
frame #19: 0x00000001061f8490 duckdb.cpython-311-darwin.so`duckdb::ExecutorTask::Execute(duckdb::TaskExecutionMode) + 36
frame #20: 0x00000001061fcad8 duckdb.cpython-311-darwin.so`duckdb::Executor::ExecuteTask() + 184
frame #21: 0x0000000106194e04 duckdb.cpython-311-darwin.so`duckdb::ClientContext::ExecuteTaskInternal(duckdb::ClientContextLock&, duckdb::PendingQueryResult&) + 56
frame #22: 0x00000001061a9884 duckdb.cpython-311-darwin.so`duckdb::PendingQueryResult::ExecuteTask() + 56
frame #23: 0x00000001053ce398 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::CompletePendingQuery(pending_query=0x0000600003b2b2c0) at pyconnection.cpp:383:36 [opt]
frame #24: 0x00000001053cf030 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::ExecuteInternal(this=<unavailable>, query=<unavailable>, params=<unavailable>, many=false) at pyconnection.cpp:500:10 [opt]
frame #25: 0x00000001053d6ac8 duckdb.cpython-311-darwin.so`duckdb::DuckDBPyConnection::RunQuery(this=0x0000600003e08018, query="insert into test select '00000000-0000-0000-0000-000000000000'::uuid from generate_series(1,1000000)", alias="unnamed_relation_13b22b457aeb6909", params=0x000000016fdfe9f8) at pyconnection.cpp:923:13 [opt]
OS:
macOS 13.6
DuckDB Version:
0.9.0, main
DuckDB Client:
Python, CLI
Full Name:
Serge Monkewitz
Affiliation:
Arista Networks
Have you tried this on the latest main
branch?
I have tested with a main build
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- Yes, I have