-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Collation support: LIKE operator and Scalar String functions #12359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I think this PR is really trying to do two distinct things, that should ideally be split up into two separate PRs:
- Add collation support to
LIKE
- Add generic collation support to string functions
I think it would be better to handle these separately. As you can see in the LIKE
code - not all string functions can handle collations by simply calling ExpressionBinder::PushCollation
on the children. As such, I think this needs more work than what is done in this PR. For example, we could extend the function itself so that it signals how collations should be handled.
As for like
itself - it seems that collations are only handled if the pattern and escape are constants. Constant patterns are only an optimization - this shifts it towards being requirement for correctness. For example, this does not work correctly for non-constant patterns in the current implementation:
SELECT 'HELLO' COLLATE NOCASE LIKE 'hel_%' AS result;
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ true │
└─────────┘
SELECT 'HELLO' COLLATE NOCASE LIKE pattern AS result FROM (SELECT 'hel_%' pattern);
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ false │
└─────────┘
I'm also unsure if the current approach works for collations in general. While it works for NOCASE
(which only requires case-insensitivity), it does not seem to work for ICU collations:
SELECT 'HELLO' COLLATE de LIKE 'HEL%' AS result;
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ false │
└─────────┘
I think much more extensive testing is required here. Making LIKE
work with arbitrary (ICU) collations seems challenging and I'm not sure if pushing collations to each of the segments is sufficient here.
If the goal is only to support NOACCENT
- we could also detect that specific collation, and just push an ILIKE
operator which already exists and works in all scenarios:
D SELECT 'HELLO' ILIKE 'hel_%' AS result;
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ true │
└─────────┘
D SELECT 'HELLO' ILIKE pattern AS result FROM (SELECT 'hel_%' pattern);
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ true │
└─────────┘
Supersedes #12359 This PR adds a new property to functions - `FunctionCollationHandling`. This can be used to handle collations across different functions in an easy way. There are three settings: * **PROPAGATE_COLLATIONS (default)**: Propagate collations, but do not apply them. This means that if the function returns `VARCHAR`, and any of the input arguments are `VARCHAR`, the return type will inherit the collations of the input arguments. * **PUSH_COMBINABLE_COLLATIONS**: Propagates collations and applies **combinable** collations prior to calling the function. Combinable collations are "simple" collations like `nocase` and `noaccent`, but not `icu` collations that are more complex. * **IGNORE_COLLATIONS**: Ignore collations entirely. `PUSH_COMBINABLE_COLLATIONS` is used for various functions, e.g. `contains`, `starts_with`, `glob`, `like`, etc: ```sql D SELECT 'hello' COLLATE NOCASE LIKE '%HEL%' AS result; ┌─────────┐ │ result │ │ boolean │ ├─────────┤ │ true │ └─────────┘ ```
This has been superseded by #14717 |
Hi all,
This PR provides support for collations in the LIKE operator and scalar string functions.
For the LIKE operator, we decompose the like pattern into segments using the
CreateLikeMatcher
, then we apply the collation on each individualliteral
, a brief discussion can be found hereFor some cases, I had to change the
bind_function_expression.cpp
and push down the collations on each function child, as a good consequence the scalar string functions support collations.I also adapted the like optimization rules to support collations while replacing the LIKE function for optimized functions such as suffix, prefix, and contains.
Examples are available on
test/sql/collate/collate_like.test
andtest/sql/collate/collate_scalar_string_functions.test