-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
What happens?
read_json_auto() seems to have scaling issue while handling large json documents. Any workarounds / fixes that is available that I am not aware of ?
To Reproduce
Issue:
If I use read_json_auto() on large file(size: 9736193; 12402 records) , it fails and had to restart DuckDB
Steps to reproduce:
-
Run duckdb in Terminal
-
Run the following query on large JSON file. (Please DM me if you need a sample json input file)
Query:
select * from read_json_auto('input.json', maximum_object_size=1048576000);
Response:
Error: INTERNAL Error: Unexpected yyjson tag in ValTypeToString -
I had to Terminate DuckDB shell and restart once I see this error
Query:
select 1;
Response:
Error: FATAL Error: Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again.
Original error: "INTERNAL Error: Unexpected yyjson tag in ValTypeToString"
What works?
Following query works if I limit rows to 2000.
select * from read_json_auto('input.json', maximum_object_size=1048576000) limit 2500;
My use case is to query all the records that was not possible due to above mentioned error. If there are any interim workarounds / suggestions, please suggest them.
OS:
Mac OS 12.6.5
DuckDB Version:
v0.7.1 b00b93f
DuckDB Client:
Python
Full Name:
Sethu Srinivasan
Affiliation:
Amazon
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree