Skip to content

Conversation

lnkuiper
Copy link
Contributor

@lnkuiper lnkuiper commented May 4, 2023

This PR implements casting from JSON to our nested types (LIST, STRUCT, MAP) and vice-versa. This mostly uses the existing to_json and from_json functionality.

For example, inserting JSON directly into tables that contain our nested types now works, using our implicit cast rules:

CREATE TABLE structs (v STRUCT(duck INTEGER));
CREATE TABLE lists (v VARCHAR[]);
CREATE table maps (v MAP(VARCHAR, INTEGER));
INSERT INTO structs VALUES ('{"duck":42}'::JSON);
INSERT INTO lists VALUES ('["duck"]'::JSON);
INSERT INTO maps VALUES ('{"duck":42}'::JSON);
SELECT * FROM structs;
-- {'duck': 42}
SELECT * FROM lists;
-- [duck]
SELECT * FROM maps;
-- {duck=42}

And of course, the other way around also works (also using our implicit cast rules!).

CREATE TABLE jsons (j JSON);
INSERT INTO jsons SELECT * FROM structs;
INSERT INTO jsons SELECT * FROM lists;
INSERT INTO jsons SELECT * FROM maps;
SELECT * FROM jsons;
-- {"duck":42}
-- ["duck"]
-- {"duck":42}

I've taken a not-so-strict approach with TRY_CAST, i.e.:

SELECT TRY_CAST('{"duck":42,"goose":43}'::JSON AS STRUCT(goose INTEGER)) s;
-- {'goose': 43}
SELECT TRY_CAST('{"duck":42}'::JSON AS STRUCT(duck INTEGER, goose INTEGER)) s;
-- {'duck': 42, 'goose': NULL}

Usually, the entire result becomes NULL if a cast fails somewhere. Instead, for these kinds of casts, only the specific nested value that fails becomes NULL. If a JSON almost matches the schema, but not completely, the matching values will still be converted, instead of it being entirely NULL.

@lnkuiper
Copy link
Contributor Author

lnkuiper commented May 4, 2023

This fixes #4303, #6687, and #7123

@Alex-Monahan
Copy link
Contributor

This is fantastic! Implicit casts will make this so much simpler for our users. Thank you!

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looks good to me. One comment -

INSERT INTO jsons SELECT * FROM maps

query I
SELECT * FROM jsons
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add more tests with the structs/vectors in test_all_types and test_vector_types (e.g. select * from test_vector_types(null::int[], false);)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to add more tests, but I'm not sure I understand what you mean.

test_all_types is in DuckDB core, while the JSON type is only in the extension. How would this work? Happy to discuss in the office today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I understand. I'll write some more tests and then you can check if it's OK

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just meant including the casts to and from the vectors, e.g.:

select test_vector::JSON from test_vector_types(null::int[], false);
select test_vector::JSON::int[]::json from test_vector_types(null::int[], false);
select test_vector::JSON from test_vector_types(null::int[][], false);
select test_vector::JSON::int[][]::json from test_vector_types(null::int[][], false);
select test_vector::JSON from test_vector_types(null::row(i varchar, j int), false);
...

The false indicates whether or not to flatten the vectors, this can be used to find bugs in handling of non-flat vectors by checking if the result is the same for both settings (true and false), e.g.:

foreach flatten true false

query I nosort expected_result_row
select test_vector::JSON from test_vector_types(null::row(i varchar, j int), ${flatten});

endloop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an excellent suggestion, thanks! I was able to quickly squash a bunch of bugs. I've also added more functionality in the process. We can now cast all of our types to JSON and back, e.g.:

D select '1996-03-27'::DATE::JSON d;
┌──────────────┐
│      d       │
│     json     │
├──────────────┤
│ "1996-03-27" │
└──────────────┘
D select '"1996-03-27"'::JSON::DATE d;
┌────────────┐
│     d      │
│    date    │
├────────────┤
│ 1996-03-27 │
└────────────┘

VARCHAR is a special case of this, and still behaves like before. It is parsed and validated instead of being interpreted as a string that has to be quoted.

@Mytherin Mytherin merged commit 64aaaa2 into duckdb:master May 10, 2023
@Mytherin
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants