Skip to content

Conversation

taniabogatsch
Copy link
Contributor

@taniabogatsch taniabogatsch commented Dec 2, 2024

This PR introduces a new component to duckdb's transaction-local storage: delete indexes.

Let's look at the following snippet: INSERT OR REPLACE fails with a constraint exception. We do not support out-of-place table updates with UNIQUE constraints, such as a primary key. An out-of-place update turns an UPDATE into DELETE + UPDATE.

CREATE TABLE tbl (i INT PRIMARY KEY, payload INT[]);
INSERT INTO tbl VALUES(1, [1, 2, 3]);
-- the LIST turns this UPDATE into a DELETE + INSERT
INSERT OR REPLACE INTO tbl VALUES(1, [4, 5, 6]);
-- throws with a constraint exception

The longstanding limitation of duckdb's over-eager constraint-checking causing this behavior is documented here: https://duckdb.org/docs/sql/indexes.html#over-eager-unique-constraint-checking.

Due to the presence of transactions, data can only be removed from the index after (1) the transaction that performed the delete is committed, and (2) no further transactions exist that refer to the old entry still present in the index. As a result of this – transactions that perform deletions followed by insertions may trigger unexpected unique constraint violations, as the deleted tuple has not actually been removed from the index yet.

To solve this issue, this PR introduces transaction-local delete indexes. When attempting to remove any data from the global index, we now store this change in the local index. We can not remove the data yet, as other transactions may still depend on it. Subsequent operations within the same local context are then aware of the deletion of that entry. Thus, before throwing a constraint violation, they double-check if the value is still there by performing a lookup against the local delete index.

There are still some limitations to this approach. I.e., concurrent changes of the same key will likely cause a write-write conflict but will no longer cause a constraint violation.

@taniabogatsch taniabogatsch changed the title Delete indexes to solve over-eager constraint checking limitations Addressing over-eager constraint checking limitations with delete indexes Dec 2, 2024
@taniabogatsch taniabogatsch changed the title Addressing over-eager constraint checking limitations with delete indexes Addressing over-eager constraint checking with delete indexes Dec 2, 2024
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great - this will resolve a lot of problems. Some comments from my side:

@taniabogatsch taniabogatsch marked this pull request as ready for review December 12, 2024 16:03
@duckdb-draftbot duckdb-draftbot marked this pull request as draft December 12, 2024 21:25
@taniabogatsch taniabogatsch marked this pull request as ready for review December 12, 2024 21:25
@Maxxen
Copy link
Member

Maxxen commented Dec 12, 2024

Extension CI passing 🙌

@Mytherin Mytherin merged commit 80b1684 into duckdb:main Dec 13, 2024
47 checks passed
@Mytherin
Copy link
Collaborator

Thanks!

@junovell
Copy link

when will the next release be?

@carlopi
Copy link
Contributor

carlopi commented Dec 26, 2024

Release is penciled in for mid-january: https://duckdb.org/docs/dev/release_calendar#upcoming-releases

krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 27, 2024
Addressing over-eager constraint checking with delete indexes (duckdb/duckdb#15092)
Fix update_extensions_ci test (duckdb/duckdb#15310)
Move away from upload-artifacts@v3 / download-artifacts@v3 (duckdb/duckdb#15309)
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 29, 2024
Addressing over-eager constraint checking with delete indexes (duckdb/duckdb#15092)
Fix update_extensions_ci test (duckdb/duckdb#15310)
Move away from upload-artifacts@v3 / download-artifacts@v3 (duckdb/duckdb#15309)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 29, 2024
Addressing over-eager constraint checking with delete indexes (duckdb/duckdb#15092)
Fix update_extensions_ci test (duckdb/duckdb#15310)
Move away from upload-artifacts@v3 / download-artifacts@v3 (duckdb/duckdb#15309)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Mytherin added a commit that referenced this pull request Jul 10, 2025
#18194)

### Overview

#15092 made it possible for ART
`PRIMARY/UNIQUE` indexes to have at most two row IDs per (unique) leaf.

Currently, the conflict manager (and `ART::VerifyLeaf`) still work with
the assumption that there can never be more than one row ID per conflict
hit. Since this assumption no longer holds, in some cases, we now have
to register conflict hits for up to two row IDs. Related issue:
duckdblabs/duckdb-internal#4924. Later in the
execution, we need to understand which of the two possible row IDs is
visible to the transaction (and the conflict manager).

As far as I can tell, this mostly kept working because we use a `vector`
for row ID scanning, which is order-preserving, and newer row IDs (more
likely the ones visible to the current transaction) overwrote older row
IDs.

### Changes in this PR

- Exposed `CanFetch` to `DataTable`, `LocalStorage`,
`RowGroupCollection` to determine whether a row ID is visible to a
transaction/in the local storage, or not.
- The conflict manager now has the capacity to register a secondary hit
for a conflict.

Also, I've refactored basically the entire conflict manager. That
refactoring includes removing the `ManagedSelection`, and a lot of code
paths. I think the number of lines only went up because I've added the
`CanFetch` stuff and because I've also added a lot of comments and some
formatting). 😅

This PR is a minor follow-up to
#18015 and next up is turning the
row ids into an unordered set instead of a vector.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants