Skip to content

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Oct 10, 2024

ArrowQueryResult can't be fetched incrementally, so this relies on the fact that the query is executed and the result is then fetched in its entirety.

This path will be used when sql was used to create a DuckDBPyRelation and polars or arrow was then used to fully consume the result into a polars DataFrame or a pyarrow ArrowTable respectively.

…t can't be fetched incrementally, so this banks on the fact that the query is executed and the result is then fetched in its entirety
@Mytherin
Copy link
Collaborator

Thanks for the PR! LGTM - perhaps we can add some benchmarks as well to verify this is being used correctly and has the desired result?

@Mytherin Mytherin changed the base branch from main to feature October 14, 2024 11:01
@Tishj
Copy link
Contributor Author

Tishj commented Oct 16, 2024

import duckdb
import time

duckdb.execute("""
	create table tbl as select * from range(10_000_000)
""")

start = time.time()
res = duckdb.sql("select * from tbl").arrow()
stop = time.time()
print(stop - start)

sql creates a Relation which doesn't execute yet, causing us to use the ArrowQueryResult
execute spins off a StreamQueryResult, so we use the old path

The difference:

  • sql - 0.17~
  • execute - 0.40~

@Tishj
Copy link
Contributor Author

Tishj commented Oct 16, 2024

Also tested with strings, booleans, nested structures ({'a': [1,2,3]}, [[1,2,3],NULL, [], [6,5,7,3,234,1]]) and the timings are consistently 2x + faster with .sql

The nested lists are somewhat equal to the .execute version surprisingly
.execute here is steadily at 1.00~ whereas .sql ranges from 0.80 - 1.1

@Mytherin
Copy link
Collaborator

Very nice results, thanks!

@Mytherin Mytherin merged commit 785372f into duckdb:feature Oct 16, 2024
19 checks passed
Mytherin added a commit that referenced this pull request Oct 17, 2024
I am not sure how it could compile in some platforms (basic Python CI)
but not in others (Pyodide, that is clang-based emscripten)

Connected to #14319

For example:
https://github.com/duckdb/duckdb/actions/runs/11372106923/job/31635971994?pr=14402#step:8:13491
```
        src/pyrelation.cpp:938:56: error: ‘class duckdb::shared_ptr<duckdb::ClientContextWrapper>’ has no member named ‘GetContext’
          938 |   auto &config = ClientConfig::GetConfig(*rel->context.GetContext());
              |                                                        ^~~~~~~~~~
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants