Skip to content

ASOF JOIN memory usage #8265

@derekmahar

Description

@derekmahar

What happens?

DuckDB runs out of memory while executing ASOF JOIN query in select_binance_transaction_times_and_prices.sql.

(Please extract source files from binance_duckdb_asof_join.zip.)

$ unzip -l binance_duckdb_asof_join.zip
Archive:  binance_duckdb_asof_join.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
     1043  2023-07-15 13:02   get_binance_trades_to_parquet
      766  2023-07-15 13:10   import_trades_from_parquet.py
      245  2023-07-15 13:21   select_binance_transaction_times_and_prices.sql
      537  2023-07-15 13:19   transaction_times.csv
---------                     -------
     2591                     4 files

To Reproduce

  1. Download price history data files for coin pair BTCUSDT:
    for month in $(seq 1 12)
    do
      # Note: requires bash, duckdb, wget, and GNU zcat.
      ./get_binance_trades_to_parquet BTCUSDT 2022 $month
    done
    
  2. Import price history for coin pair BTCUSDT into DuckDB database binance.duckdb:
    for month in $(seq 1 12)
    do
      python import_trades_from_parquet.py BTCUSDT 2022 $month
    done
    
  3. Execute ASOF JOIN query:
    cat select_binance_transaction_times_and_prices.sql | duckdb binance.duckdb
    

OS:

Ubuntu Server 22.04

DuckDB Version:

v0.8.2-dev1764 07b0b0a

DuckDB Client:

CLI

Full Name:

Derek Mahar

Affiliation:

Self-employed

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions