Deletion of CatalogEntry dependencies not protected against race condition #5553

Tishj · 2022-11-30T13:24:45Z

This PR fixes #5520

I took a stab at fixing this, but I'm not sure if I'm just applying a bandaid for a bigger issue that happened upstream?

Maybe for every CatalogEntry we can assume there is an entry in the dependents_map and my conditional check for this should always be true, but isn't because of a totally different issue?

…ling dependency_manager->DropObject

Tishj · 2022-11-30T14:43:06Z

Yea.. I don't think this is the fix, essentially the provided example is:

# Target table
statement ok
CREATE TABLE foo(
	x FLOAT,
	y FLOAT,
	z FLOAT,
);

# Source table
statement ok
create table bar as select 0,0,0 from range(10000);

concurrentloop threadid 0 4000 # They have more threads, but I eventually hit a limit in the sql logic tester

# Create the view
statement ok
CREATE TEMPORARY VIEW df AS SELECT * from bar;

# Use the view
statement ok
INSERT INTO foo select * from df;

# Drop the view
statement ok
DROP VIEW df;

endloop

And this completes successfully
So it's either local to the PyConnection::RegisterPythonObject method, or it's related to creating a view from a table function instead of just a regular table

Or it could be due to a bug in the Relation API

EDIT:
It's not a bug related to a view over a table function, this also works just fine

CREATE TEMPORARY VIEW df AS select 0,0,0 from range(10000);

EDIT:
I think the issue is with Cursor()

a Cursor is a DuckDBPyConnection with a shared DatabaseInstance, but separate Connection
since Temp views are stored relative to the Connection, they can all have the same name df
But the issue occurs in CatalogSet::DropEntry, so some part of this View does live in the catalog, and likely retrieving the ViewCatalogEntry is causing problems (retrieving the View of another cursor) - which causes the drop entry to be invoked twice

Tishj · 2022-12-01T10:16:51Z

I first suspected that the object got deleted twice from the dependencies_map, but after debugging this, it seems the object is deleted before even being added to the dependencies_map

ADDED: a65445 | (0libc++abi: terminating with uncaught exception of type duckdb::InternalException: INTERNAL Error: Entry of type 'View' already deleted
x600056bae4c0)
ADDED: a65446 | (0x600056b46940)
ADDED: a65449 | (0x600056b70cc0)
ADDED: a65448 | (0x600056b59800)
DELETED: a65437 | (0x600056b74240)
DELETED: a65435 | (0x600056b74480)
DELETED: a65438 | (0x600056b43480)
ADDED: a65450 | (0x600056b47b40)
ADDED: a65451 | (0x600056b58a80)
DELETED: a65439 | (0x600056b70900)
DELETED: a65441 | (0x600056b43b40)
DELETED: a65440 | (0x600056b43d80)
DELETED: a65442 | (0x600056bae940)
DELETED: a65447 | (0x600056b703c0)
DELETED: a65443 | (0x600056bae400)
ADDED: a65452 | (0x600056b53540)
DELETED: a65449 | (0x600056b70cc0)
ADDED: a65455 | (0x600056bae940)
ADDED: a65454 | (0x600056b749c0)
ADDED: a65456 | (0x600056b43780)
ADDED: a65453 | (0x600056b430c0)
ADDED: a65458 | (0x600056b47540)
ADDED: a65457 | (0x600056b4e1c0)
ADDED: a65459 | (0x600056b4e040)
DELETED: a65445 | (0x600056bae4c0)
DELETED: a65444 | (0x600056b473c0)
DELETED: a65446 | (0x600056b46940)
ADDED: a65460 | (0x600056b46940)
DELETED: a65448 | (0x600056b59800)
ADDED: a65462 | (0x600056b7c000)
ADDED: a65464 | (0x600056b71380)
ADDED: a65463 | (0x600056b5f840)
ADDED: a65461 | (0x600056b59bc0)
DELETED: a65455 | (0x600056bae940)
DELETED: a65450 | (0x600056b47b40)
DELETED: a65454 | (0x600056b749c0)
COULD NOT FIND: a65451 | (0x600056b58a80)

(the aXXXX is the alias of the view, used to identify from which thread it came)
It could not find a65451, even though it was added (and the pointer is the same as well)
Even though there is no print of a65451 getting deleted

The 0libc++abi: terminating with uncaught exception of type duckdb::InternalException: INTERNAL Error: Entry of type 'View' already deleted portion logically comes after the "COULD NOT FIND .." print, so I'm guessing it actually did get deleted before it was added

I can also reproduce this issue without using TEMP views

Changing these to non-temporary views, and adding a couple assertions, I observe the following problem

* thread #28, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x0000000124a5c04c duckdb.cpython-310-darwin.so`duckdb::CatalogSet::UpdateTimestamp(this=0x000000010221b460, entry=0x000060000f84f600, timestamp=140053) at catalog_set.cpp:494:34
   491          entry->timestamp = timestamp;
   492          D_ASSERT(mapping.find(entry->name) != mapping.end());
   493          D_ASSERT(mapping[entry->name] != nullptr);
-> 494          mapping[entry->name]->timestamp = timestamp;
   495  }

(lldb) p mapping[entry->name]
(std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<duckdb::MappingValue, std::default_delete<duckdb::MappingValue> >, duckdb::CaseInsensitiveStringHashFunction, duckdb::CaseInsensitiveStringEquality, std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<duckdb::MappingValue, std::default_delete<duckdb::MappingValue> > > > >::mapped_type) $0 = nullptr {
__value_ = nullptr
}

So there's a data race on mapping, it passed the assertion, it wasn't nullptr
but then on the very next line we access it and try to change the timestamp of it, but at that point it is nullptr

fixed that by locking the catalog_lock before changing the mapping in UpdateTimestamp
it still produced a segfault in later runs, just a different one now 🙃

Tishj · 2022-12-02T10:28:56Z

I've been experimenting with different options to narrow down the cause, but I think it's just a race condition in the catalog
Here is a test that also triggers this issue, different in a couple ways from the initial example

from tracemalloc import start
import duckdb
import pandas as pd
import numpy as np
import sys
import time
from threading import Thread

error = None

DATABASE = ":memory:db"

conn = duckdb.connect(DATABASE)
conn.execute("""CREATE OR REPLACE TABLE foo
             (
                x FLOAT,
                y FLOAT,
                z FLOAT,
                );""")

def work(i):
    global error
    try:
        db = duckdb.connect(DATABASE)
        db.execute(f"create or replace table df{i} as select i,i,i from range({100}) tbl(i)")
        db.execute(f"drop table df{i}")
    except duckdb.Error as e:
        if not error:
            error = e
        else:
	        exit(1)

print("Start")
threads = []

concurrent_threads = 1000

for i in range(100_000):
    threads.append(Thread(target=work, args=(i,), name=f"thread_{i}"))

started_threads = []

for t in threads:
    if error:
        break;
    try:
        t.start()
    except RuntimeError:
        continue
    started_threads.append(t)


for t in started_threads:
    t.join()

if error:
    print(error, file=sys.stderr)
    exit(1)

print("done!")

set object to deleted while holding lock + check condition before cal…

94237bb

…ling dependency_manager->DropObject

Tishj mentioned this pull request Dec 5, 2022

[Catalog] Grab missing write_locks in a couple places #5601

Merged

Tishj closed this Dec 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deletion of CatalogEntry dependencies not protected against race condition #5553

Deletion of CatalogEntry dependencies not protected against race condition #5553

Tishj commented Nov 30, 2022

Uh oh!

Tishj commented Nov 30, 2022 •

edited

Loading

Uh oh!

Tishj commented Dec 1, 2022 •

edited

Loading

Uh oh!

Tishj commented Dec 2, 2022

Uh oh!

Uh oh!

Deletion of CatalogEntry dependencies not protected against race condition #5553

Deletion of CatalogEntry dependencies not protected against race condition #5553

Conversation

Tishj commented Nov 30, 2022

Uh oh!

Tishj commented Nov 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tishj commented Dec 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Tishj commented Dec 2, 2022

Uh oh!

Uh oh!

Tishj commented Nov 30, 2022 •

edited

Loading

Tishj commented Dec 1, 2022 •

edited

Loading