-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Description
What happens?
I have a large, nested JSON where one of the fields is mostly null, except for a handful of lines.
cat metadata.json_part2 | grep ResultFlags | sort | uniq -c
1 "ResultFlags": "MicrofluidicsRegulatedVacuumError",
7 "ResultFlags": "PressureError",
18 "ResultFlags": "TemperatureError",
47973 "ResultFlags": null,
An error is thrown when the duckdb parser encounters this field. I can bypass it using ignore_errors=1 but then I don't know what else could be wrong.
Would you consider this appropriate behavior or a bug?
Why does it say in the error message that it's trying to cast to numerical?
SELECT
...
FROM read_json_auto('metadata.json_part2', ignore_errors=0)
InvalidInputException Traceback (most recent call last)
File <timed exec>:7
InvalidInputException: Invalid Input Error: JSON transform error in file "E:/output\metadata.json_part2", in object 2286: Failed to cast value to numerical: "MicrofluidicsRegulatedVacuumError"
Try increasing 'sample_size', reducing 'maximum_depth', specifying 'columns' manually, specifying 'lines' or 'json_format' manually, or setting 'ignore_errors' to true.
To Reproduce
It doesn't reproduce with a small file:
{"Results": {"ResultFlags": null}}
{"Results": {"ResultFlags": "notnull"}}
{"Results": {"ResultFlags": null}}
SELECT Results.ResultFlags
FROM read_json_auto('test_null_then_filled.json', ignore_errors=0)
returns expected result.
So I'm not sure how to proceed here.
OS:
Windows (pyenv-win) and Ubuntu (WSL, pyenv)
DuckDB Version:
0.7.1
DuckDB Client:
python
Full Name:
Alex Kislukhin
Affiliation:
self
Have you tried this on the latest master branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
Metadata
Metadata
Assignees
Labels
No labels