-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
What happens?
Based on https://duckdb.org/docs/api/python/reference/ the fetch_df_chunk()
docs read:
Fetch a chunk of the result as Data.Frame following execute()
The way I understand it is that if I perform a db_conn.execute('....').fetch_df_chunks(10)
I would get the first 10 rows back. Mind you; I may very well be wrong, but in the event I am not, there is something that may not be working. When I use the fetch_df_chunk()
then regardless of the value I put as an argument (apart from 0 which does not return anything) all the rows from the source are returned. I have only tried this against parquet files, so whether the potential issue is more general than this, I cannot say.
To Reproduce
import pandas as pd
import numpy as np
import duckdb
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df.to_parquet('test.parquet', engine='pyarrow', compression='snappy')
db_conn = duckdb.connect()
db_conn.execute("SELECT A FROM parquet_scan('test.parquet')").fetch_df_chunk(10)
Tested with Python 3.10.5, pandas==1.4.3, numpy==1.23.1 and duckdb==0.5.1
OS:
Windows 10
DuckDB Version:
0.5.1
DuckDB Client:
Python
Full Name:
Thomas Nielsen
Affiliation:
Coop
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree