Skip to content

error when inserting duped values into primary column with many values #11924

@NickCrews

Description

@NickCrews

Possibly related to #3789

I'm getting

InternalException: INTERNAL Error: Could not find node in column segment tree!
Attempting to find row number "36028797019024731" in 2 nodes
Node 0: Start 36028797018960000, Count 0Node 1: Start 36028797019082880, Count 0

repro, using addresses.parquet.zip

import duckdb

sql = """
CREATE TABLE addresses(
    address__id UUID PRIMARY KEY NOT NULL,
    person__id UUID NOT NULL,
    street1 VARCHAR,
    street2 VARCHAR,
    city VARCHAR,
    state VARCHAR,
    zipcode VARCHAR,
    country VARCHAR,
    mailing_status VARCHAR,
    is_mailing BOOLEAN,
    is_voting BOOLEAN,
    latitude DOUBLE,
    longitude DOUBLE,
    last_updated DATE,
    source VARCHAR,
);
INSERT OR REPLACE INTO addresses
FROM
    read_parquet('addresses.parquet')
-- WHERE address__id <> '8753b9f7-46fc-4fd5-b318-de20305d5462'
;
"""
ddb = duckdb.connect(":memory:")
ddb.sql(sql)

In the attached parquet there is one duplicate uuid, "8753b9f7-46fc-4fd5-b318-de20305d5462". If you uncomment the one line in the SQL, then no error.

Things I've experimented with

  • reduce the number of rows in the parquet file: I get a normal ConstraintException
  • reorder the rows so the dupe rows are at the beginning of the file: I get a normal ConstraintException
  • drop some of the other columns from the DDL and the parquet: I get a normal ConstraintException

I am on a nightly build duckdb-0.10.3.dev601, running on a mac M1.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions