-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Labels
Description
Possibly related to #3789
I'm getting
InternalException: INTERNAL Error: Could not find node in column segment tree!
Attempting to find row number "36028797019024731" in 2 nodes
Node 0: Start 36028797018960000, Count 0Node 1: Start 36028797019082880, Count 0
repro, using addresses.parquet.zip
import duckdb
sql = """
CREATE TABLE addresses(
address__id UUID PRIMARY KEY NOT NULL,
person__id UUID NOT NULL,
street1 VARCHAR,
street2 VARCHAR,
city VARCHAR,
state VARCHAR,
zipcode VARCHAR,
country VARCHAR,
mailing_status VARCHAR,
is_mailing BOOLEAN,
is_voting BOOLEAN,
latitude DOUBLE,
longitude DOUBLE,
last_updated DATE,
source VARCHAR,
);
INSERT OR REPLACE INTO addresses
FROM
read_parquet('addresses.parquet')
-- WHERE address__id <> '8753b9f7-46fc-4fd5-b318-de20305d5462'
;
"""
ddb = duckdb.connect(":memory:")
ddb.sql(sql)
In the attached parquet there is one duplicate uuid, "8753b9f7-46fc-4fd5-b318-de20305d5462". If you uncomment the one line in the SQL, then no error.
Things I've experimented with
- reduce the number of rows in the parquet file: I get a normal ConstraintException
- reorder the rows so the dupe rows are at the beginning of the file: I get a normal ConstraintException
- drop some of the other columns from the DDL and the parquet: I get a normal ConstraintException
I am on a nightly build duckdb-0.10.3.dev601
, running on a mac M1.