Skip to content

Conversation

taniabogatsch
Copy link
Contributor

This PR addresses two issues concerning ATTACH.

  • We need to check against all previously attached database paths for each new database we attach. We only proceed if we haven't attached a database with the same path. Before this PR, the check happened in linear time, so with an increasing amount of attached databases, any new attachment took longer. Now, we use a map<path, name> to look up the path, and we can also use the name to retrieve the matching catalog entry in constant time.
  • We use locking on the map<path, name> to ensure that we never have more than a single file handle to a file.

I also noticed that the ":memory:" string crept into our code at many places, so I added a constant DConstants::IN_MEMORY_PATH for it and replaced all occurrences in the src subdirectories.

Here are some timings on my machine for five runs of attaching 30_000 files with 32 threads attaching simultaneously. To run this, the test test_attach.cpp included in this PR can be modified to run with higher file and thread counts.

current implementation [ms]
16539
15565
15786
16078
15727

this PR [ms]
1788
1417
1373
1468
1385

@Tishj
Copy link
Contributor

Tishj commented Nov 14, 2023

Just one small note, maybe the ":memory:" should live in this list instead:

// NOTE: there is a copy of this in the Postgres' parser grammar (gram.y)
#define DEFAULT_SCHEMA  "main"
#define INVALID_SCHEMA  ""
#define INVALID_CATALOG ""
#define SYSTEM_CATALOG  "system"
#define TEMP_CATALOG    "temp"

(constants.hpp)

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks great - good results. Some comments:

@taniabogatsch
Copy link
Contributor Author

@Mytherin, what about this?

#9671 (comment)

@Mytherin
Copy link
Collaborator

Maybe move it to the define for consistency and we can figure out a cleaner way of doing it later

@github-actions github-actions bot marked this pull request as draft November 14, 2023 15:49
@taniabogatsch taniabogatsch marked this pull request as ready for review November 14, 2023 18:53
@github-actions github-actions bot marked this pull request as draft November 15, 2023 13:03
@taniabogatsch taniabogatsch marked this pull request as ready for review November 15, 2023 13:39
@github-actions github-actions bot marked this pull request as draft November 15, 2023 15:12
@taniabogatsch taniabogatsch marked this pull request as ready for review November 15, 2023 15:13
@github-actions github-actions bot marked this pull request as draft November 15, 2023 17:00
@taniabogatsch taniabogatsch marked this pull request as ready for review November 15, 2023 17:00
@github-actions github-actions bot marked this pull request as draft November 16, 2023 16:24
@taniabogatsch taniabogatsch marked this pull request as ready for review November 17, 2023 16:00
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Looks great, some more minor comments:

@Mytherin Mytherin changed the base branch from feature to main November 20, 2023 12:41
@github-actions github-actions bot marked this pull request as draft November 20, 2023 13:19
@taniabogatsch taniabogatsch marked this pull request as ready for review November 20, 2023 13:20
@Mytherin Mytherin merged commit b28765a into duckdb:main Nov 21, 2023
@Mytherin
Copy link
Collaborator

Thanks!

@taniabogatsch taniabogatsch deleted the profile-attach branch November 21, 2023 11:25
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request Dec 11, 2023
Merge pull request duckdb/duckdb#9671 from taniabogatsch/profile-attach
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants