Skip to content

Conversation

lnkuiper
Copy link
Contributor

Fixes #14167 - not a bug, but behavior is now different.

We used to auto-detect STRUCT, but we now auto-detect MAP, and dot-notation for extracting fields no longer works for certain inputs.

However, if we implement map_extract_first and rewrite dot-notation on MAPs to this function we can still do the previous behavior.

Since this is not a bugfix but somewhat of a feature we could consider pointing this to the feature branch instead.

@Tishj
Copy link
Contributor

Tishj commented Sep 30, 2024

map_extract should actually not return a LIST, the returned list is either empty or of size 1, which is kind of nonsense.
I'm not sure if we want to make that change though (for backwards compatibility reasons)

Creating map_extract_first would kind of cement this into place, making it even harder to change the behavior of map_extract in the future imo

@Maxxen
Copy link
Member

Maxxen commented Sep 30, 2024

@Tishj I thought the point of map_extract returning a list was to differentiate between a value being NULL and not present at all. Although I agree that it does add a lot of friction, now that we have map_contains and IN support you can check for missing values separately.

@Tishj
Copy link
Contributor

Tishj commented Sep 30, 2024

@Tishj I thought the point of map_extract returning a list was to differentiate between a value being NULL and not present at all. Although I agree that it does add a lot of friction, now that we have map_contains and IN support you can check for missing values separately.

Hmm that makes some sense, I hadn't thought of that

@Tishj
Copy link
Contributor

Tishj commented Sep 30, 2024

Also, I'm not sure if it has an impact here, but there is a difference between struct extract and map extract in case-insensitivity

struct keys are case-insensitive, map keys aren't, since they're regular varchars, not identifiers, they are case-sensitive

@lnkuiper
Copy link
Contributor Author

Json extract keys are also case sensitive, but we allow dot notation for that too, so this is consistent with that, at least.

@lnkuiper
Copy link
Contributor Author

Also, we also cannot differentiate between list_extract returning NULL due to the index being out-of-range, or a NULL being returned from the specified position in the list. We could extend this logic to map_extract, and conclude that map_extract shouldn't return a list for the sole reason to be able to differentiate. I'm pretty sure that, e.g., Presto also doesn't return a list, but just a scalar value. Maybe we should just change map_extract then instead of adding this function?

@Tishj
Copy link
Contributor

Tishj commented Sep 30, 2024

Yea I agree, but like I said that would break backwards compatibility, so it's a question if we want to do that.
Right now we're in between, adding map_extract_first would move us much further in the direction of having this map_extract behavior be permanent I feel.

I think either Hannes or Mark should make this call, I'd be in favor of having map_extract return a scalar

@hannes hannes requested a review from Mytherin October 2, 2024 09:26
@Mytherin Mytherin changed the base branch from main to feature October 3, 2024 18:07
@Mytherin
Copy link
Collaborator

Mytherin commented Oct 3, 2024

Thanks for the PR!

I think the reason map_extract returns a list is because maps in their original implementation were actually multimaps that could hold duplicate keys. That is no longer the case since #3760 - as such I would argue map_extract returning a list is no longer required/a good idea either.

I would be in favor of changing map_extract - but we need to be a bit careful w.r.t. backwards compatibility here.

In any case we should not be modifying the signature of map_extract in a bug-fix release.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 7, 2024 08:27
@lnkuiper lnkuiper marked this pull request as ready for review October 7, 2024 08:27
@lnkuiper
Copy link
Contributor Author

lnkuiper commented Oct 7, 2024

Thanks for the discussion and feedback. I've removed map_extract_first, made map_extract now return a scalar value, and fixed related tests.

@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 8, 2024 11:19
@lnkuiper lnkuiper marked this pull request as ready for review October 8, 2024 11:20
@duckdb-draftbot duckdb-draftbot marked this pull request as draft October 8, 2024 14:19
@lnkuiper lnkuiper added the Needs Documentation Use for issues or PRs that require changes in the documentation label Oct 9, 2024
@lnkuiper
Copy link
Contributor Author

lnkuiper commented Oct 9, 2024

I've added the "Needs documentation" label as this PR now changes the behavior of map_extract.

@lnkuiper lnkuiper marked this pull request as ready for review October 9, 2024 08:03
@Mytherin Mytherin merged commit 5c0d8ec into duckdb:feature Oct 21, 2024
43 of 44 checks passed
@Mytherin
Copy link
Collaborator

Thanks! LGTM - we just need to make sure to mention this change in behavior in the release notes

Mytherin added a commit that referenced this pull request Jan 20, 2025
This PR is a follow-up from #14175 that changed the return value of
`map_extract` and `element_at` to return a single value from maps by key
instead of a list with the value.

This is a good change, but unfortunately It breaks backwards/forwards
compatibility for serialized query plans. This PR fixes it by reverting
the change to `map_extract` but instead add a new `map_extract_value`
with the new behavior. It also changes the syntax sugar (e.g. `[]`) to
use this new function instead, so the end user still "mostly" sees the
new behavior, while the system internally still keeps the same behavior
for the actual function.
@lnkuiper lnkuiper deleted the map_extract_first branch April 14, 2025 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Documentation Use for issues or PRs that require changes in the documentation Ready For Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dot notation for json field extraction is no longer working in v1.1.*
4 participants