Skip to content

read_json_auto() throws INTERNAL Error: Unexpected yyjson tag in ValTypeToString #7307

@sethusrinivasan

Description

@sethusrinivasan

What happens?

read_json_auto() seems to have scaling issue while handling large json documents. Any workarounds / fixes that is available that I am not aware of ?

To Reproduce

Issue:
If I use read_json_auto() on large file(size: 9736193; 12402 records) , it fails and had to restart DuckDB

Steps to reproduce:

  1. Run duckdb in Terminal

  2. Run the following query on large JSON file. (Please DM me if you need a sample json input file)
    Query:
    select * from read_json_auto('input.json', maximum_object_size=1048576000);
    Response:
    Error: INTERNAL Error: Unexpected yyjson tag in ValTypeToString

  3. I had to Terminate DuckDB shell and restart once I see this error
    Query:
    select 1;
    Response:
    Error: FATAL Error: Failed: database has been invalidated because of a previous fatal error. The database must be restarted prior to being used again.
    Original error: "INTERNAL Error: Unexpected yyjson tag in ValTypeToString"

What works?
Following query works if I limit rows to 2000.
select * from read_json_auto('input.json', maximum_object_size=1048576000) limit 2500;

My use case is to query all the records that was not possible due to above mentioned error. If there are any interim workarounds / suggestions, please suggest them.

OS:

Mac OS 12.6.5

DuckDB Version:

v0.7.1 b00b93f

DuckDB Client:

Python

Full Name:

Sethu Srinivasan

Affiliation:

Amazon

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions