-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Deletion of CatalogEntry dependencies not protected against race condition #5553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ling dependency_manager->DropObject
Yea.. I don't think this is the fix, essentially the provided example is: # Target table
statement ok
CREATE TABLE foo(
x FLOAT,
y FLOAT,
z FLOAT,
);
# Source table
statement ok
create table bar as select 0,0,0 from range(10000);
concurrentloop threadid 0 4000 # They have more threads, but I eventually hit a limit in the sql logic tester
# Create the view
statement ok
CREATE TEMPORARY VIEW df AS SELECT * from bar;
# Use the view
statement ok
INSERT INTO foo select * from df;
# Drop the view
statement ok
DROP VIEW df;
endloop And this completes successfully Or it could be due to a bug in the Relation API EDIT: CREATE TEMPORARY VIEW df AS select 0,0,0 from range(10000); EDIT: a Cursor is a DuckDBPyConnection with a shared DatabaseInstance, but separate Connection |
I first suspected that the object got deleted twice from the
(the The I can also reproduce this issue without using TEMP views Changing these to non-temporary views, and adding a couple assertions, I observe the following problem
So there's a data race on
|
I've been experimenting with different options to narrow down the cause, but I think it's just a race condition in the catalog from tracemalloc import start
import duckdb
import pandas as pd
import numpy as np
import sys
import time
from threading import Thread
error = None
DATABASE = ":memory:db"
conn = duckdb.connect(DATABASE)
conn.execute("""CREATE OR REPLACE TABLE foo
(
x FLOAT,
y FLOAT,
z FLOAT,
);""")
def work(i):
global error
try:
db = duckdb.connect(DATABASE)
db.execute(f"create or replace table df{i} as select i,i,i from range({100}) tbl(i)")
db.execute(f"drop table df{i}")
except duckdb.Error as e:
if not error:
error = e
else:
exit(1)
print("Start")
threads = []
concurrent_threads = 1000
for i in range(100_000):
threads.append(Thread(target=work, args=(i,), name=f"thread_{i}"))
started_threads = []
for t in threads:
if error:
break;
try:
t.start()
except RuntimeError:
continue
started_threads.append(t)
for t in started_threads:
t.join()
if error:
print(error, file=sys.stderr)
exit(1)
print("done!") |
This PR fixes #5520
I took a stab at fixing this, but I'm not sure if I'm just applying a bandaid for a bigger issue that happened upstream?
Maybe for every CatalogEntry we can assume there is an entry in the
dependents_map
and my conditional check for this should always be true, but isn't because of a totally different issue?