Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Nov 30, 2022

This PR fixes #5520

I took a stab at fixing this, but I'm not sure if I'm just applying a bandaid for a bigger issue that happened upstream?

Maybe for every CatalogEntry we can assume there is an entry in the dependents_map and my conditional check for this should always be true, but isn't because of a totally different issue?

@Tishj
Copy link
Contributor Author

Tishj commented Nov 30, 2022

Yea.. I don't think this is the fix, essentially the provided example is:

# Target table
statement ok
CREATE TABLE foo(
	x FLOAT,
	y FLOAT,
	z FLOAT,
);

# Source table
statement ok
create table bar as select 0,0,0 from range(10000);

concurrentloop threadid 0 4000 # They have more threads, but I eventually hit a limit in the sql logic tester

# Create the view
statement ok
CREATE TEMPORARY VIEW df AS SELECT * from bar;

# Use the view
statement ok
INSERT INTO foo select * from df;

# Drop the view
statement ok
DROP VIEW df;

endloop

And this completes successfully
So it's either local to the PyConnection::RegisterPythonObject method, or it's related to creating a view from a table function instead of just a regular table

Or it could be due to a bug in the Relation API

EDIT:
It's not a bug related to a view over a table function, this also works just fine

CREATE TEMPORARY VIEW df AS select 0,0,0 from range(10000);

EDIT:
I think the issue is with Cursor()

a Cursor is a DuckDBPyConnection with a shared DatabaseInstance, but separate Connection
since Temp views are stored relative to the Connection, they can all have the same name df
But the issue occurs in CatalogSet::DropEntry, so some part of this View does live in the catalog, and likely retrieving the ViewCatalogEntry is causing problems (retrieving the View of another cursor) - which causes the drop entry to be invoked twice

@Tishj
Copy link
Contributor Author

Tishj commented Dec 1, 2022

I first suspected that the object got deleted twice from the dependencies_map, but after debugging this, it seems the object is deleted before even being added to the dependencies_map

ADDED: a65445 | (0libc++abi: terminating with uncaught exception of type duckdb::InternalException: INTERNAL Error: Entry of type 'View' already deleted
x600056bae4c0)
ADDED: a65446 | (0x600056b46940)
ADDED: a65449 | (0x600056b70cc0)
ADDED: a65448 | (0x600056b59800)
DELETED: a65437 | (0x600056b74240)
DELETED: a65435 | (0x600056b74480)
DELETED: a65438 | (0x600056b43480)
ADDED: a65450 | (0x600056b47b40)
ADDED: a65451 | (0x600056b58a80)
DELETED: a65439 | (0x600056b70900)
DELETED: a65441 | (0x600056b43b40)
DELETED: a65440 | (0x600056b43d80)
DELETED: a65442 | (0x600056bae940)
DELETED: a65447 | (0x600056b703c0)
DELETED: a65443 | (0x600056bae400)
ADDED: a65452 | (0x600056b53540)
DELETED: a65449 | (0x600056b70cc0)
ADDED: a65455 | (0x600056bae940)
ADDED: a65454 | (0x600056b749c0)
ADDED: a65456 | (0x600056b43780)
ADDED: a65453 | (0x600056b430c0)
ADDED: a65458 | (0x600056b47540)
ADDED: a65457 | (0x600056b4e1c0)
ADDED: a65459 | (0x600056b4e040)
DELETED: a65445 | (0x600056bae4c0)
DELETED: a65444 | (0x600056b473c0)
DELETED: a65446 | (0x600056b46940)
ADDED: a65460 | (0x600056b46940)
DELETED: a65448 | (0x600056b59800)
ADDED: a65462 | (0x600056b7c000)
ADDED: a65464 | (0x600056b71380)
ADDED: a65463 | (0x600056b5f840)
ADDED: a65461 | (0x600056b59bc0)
DELETED: a65455 | (0x600056bae940)
DELETED: a65450 | (0x600056b47b40)
DELETED: a65454 | (0x600056b749c0)
COULD NOT FIND: a65451 | (0x600056b58a80)

(the aXXXX is the alias of the view, used to identify from which thread it came)
It could not find a65451, even though it was added (and the pointer is the same as well)
Even though there is no print of a65451 getting deleted

The 0libc++abi: terminating with uncaught exception of type duckdb::InternalException: INTERNAL Error: Entry of type 'View' already deleted portion logically comes after the "COULD NOT FIND .." print, so I'm guessing it actually did get deleted before it was added

I can also reproduce this issue without using TEMP views

Changing these to non-temporary views, and adding a couple assertions, I observe the following problem

* thread #28, stop reason = EXC_BAD_ACCESS (code=1, address=0x10)
    frame #0: 0x0000000124a5c04c duckdb.cpython-310-darwin.so`duckdb::CatalogSet::UpdateTimestamp(this=0x000000010221b460, entry=0x000060000f84f600, timestamp=140053) at catalog_set.cpp:494:34
   491          entry->timestamp = timestamp;
   492          D_ASSERT(mapping.find(entry->name) != mapping.end());
   493          D_ASSERT(mapping[entry->name] != nullptr);
-> 494          mapping[entry->name]->timestamp = timestamp;
   495  }
(lldb) p mapping[entry->name]
(std::unordered_map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<duckdb::MappingValue, std::default_delete<duckdb::MappingValue> >, duckdb::CaseInsensitiveStringHashFunction, duckdb::CaseInsensitiveStringEquality, std::allocator<std::pair<const std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::unique_ptr<duckdb::MappingValue, std::default_delete<duckdb::MappingValue> > > > >::mapped_type) $0 = nullptr {
__value_ = nullptr
}

So there's a data race on mapping, it passed the assertion, it wasn't nullptr
but then on the very next line we access it and try to change the timestamp of it, but at that point it is nullptr

  • fixed that by locking the catalog_lock before changing the mapping in UpdateTimestamp
    it still produced a segfault in later runs, just a different one now 🙃

@Tishj
Copy link
Contributor Author

Tishj commented Dec 2, 2022

I've been experimenting with different options to narrow down the cause, but I think it's just a race condition in the catalog
Here is a test that also triggers this issue, different in a couple ways from the initial example

from tracemalloc import start
import duckdb
import pandas as pd
import numpy as np
import sys
import time
from threading import Thread

error = None

DATABASE = ":memory:db"

conn = duckdb.connect(DATABASE)
conn.execute("""CREATE OR REPLACE TABLE foo
             (
                x FLOAT,
                y FLOAT,
                z FLOAT,
                );""")

def work(i):
    global error
    try:
        db = duckdb.connect(DATABASE)
        db.execute(f"create or replace table df{i} as select i,i,i from range({100}) tbl(i)")
        db.execute(f"drop table df{i}")
    except duckdb.Error as e:
        if not error:
            error = e
        else:
	        exit(1)

print("Start")
threads = []

concurrent_threads = 1000

for i in range(100_000):
    threads.append(Thread(target=work, args=(i,), name=f"thread_{i}"))

started_threads = []

for t in threads:
    if error:
        break;
    try:
        t.start()
    except RuntimeError:
        continue
    started_threads.append(t)


for t in started_threads:
    t.join()

if error:
    print(error, file=sys.stderr)
    exit(1)

print("done!")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python] Segfault in DuckDBPyConnection.unregister() in multi-threaded program
1 participant