-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Closed
Description
What happens?
When reading a large file with the array_of_values flag and a large maximum_object_size
an address sanitizer issue is triggered. Interestingly enough this does not seem to happen when using array_of_records
.
To Reproduce
Python script to generate the file:
text = '''{
"event_id": "VARCHAR",
"user_id": "VARCHAR",
"action": "VARCHAR",
"client_time": "VARCHAR",
"metadata": {
"argv": [
"VARCHAR"
],
"dag": {
"dag_size": "VARCHAR",
"tasks": {
"load_oscar": {
"status": "VARCHAR",
"type": "VARCHAR",
"upstream": "JSON",
"products": {
"nb": "VARCHAR"
}
},
"load_weather": {
"status": "VARCHAR",
"type": "VARCHAR",
"upstream": "JSON",
"products": {
"nb": "VARCHAR"
}
},
"compress": {
"status": "VARCHAR",
"type": "VARCHAR",
"upstream": {
"load_oscar": "VARCHAR"
},
"products": {
"nb": "VARCHAR"
}
}
}
}
},
"total_runtime": "VARCHAR",
"python_version": "VARCHAR",
"version": "VARCHAR",
"package_name": "VARCHAR",
"docker_container": "BOOLEAN",
"cloud": "NULL",
"email": "NULL",
"os": "VARCHAR",
"environment": "VARCHAR",
"telemetry_version": "VARCHAR",
"$lib": "VARCHAR",
"$lib_version": "VARCHAR",
"$geoip_city_name": "VARCHAR",
"$geoip_country_name": "VARCHAR",
"$geoip_country_code": "VARCHAR",
"$geoip_continent_name": "VARCHAR",
"$geoip_continent_code": "VARCHAR",
"$geoip_postal_code": "VARCHAR",
"$geoip_latitude": "DOUBLE",
"$geoip_longitude": "DOUBLE",
"$geoip_time_zone": "VARCHAR",
"$geoip_subdivision_1_code": "VARCHAR",
"$geoip_subdivision_1_name": "VARCHAR",
"$plugins_succeeded": [
"VARCHAR"
],
"$plugins_failed": [
"NULL"
],
"$plugins_deferred": [
"NULL"
],
"$ip": "VARCHAR"
}
'''
with open('issue.json', 'w+') as f:
f.write('[' + ','.join([text for x in range(10000)]) + ']')
SQL query:
select json_structure(json ->> '$.properties') as structure,
from read_json('issue.json', json_format='array_of_values', columns={'json': 'JSON'}, maximum_object_size=104857600)
limit 1;
OS:
MacOS
DuckDB Version:
Dev
DuckDB Client:
CLI
Full Name:
Mark
Affiliation:
DuckDB Labs
Have you tried this on the latest master
branch?
- I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- I agree
Metadata
Metadata
Assignees
Labels
No labels