Skip to content

Conversation

mfussenegger
Copy link
Member

@mfussenegger mfussenegger commented May 13, 2025

To conform with Postgres behaviour return the function name, without
including the complete function call with the args, as output column
name. Also, previously, for scalars that can be collapsed to a literal
return value, we used that value for the output column name instead of
the function name.

For more details on the behavior change see the breaking change entry
in release notes of 6.0.0.

Closes: #17393

@mfussenegger mfussenegger force-pushed the j/column-name-for-scalars branch from d52afc0 to a8eee1b Compare May 13, 2025 12:12
@mfussenegger
Copy link
Member Author

@matriv @kneth rebased and pushed a fixup to the state that was on kneth/column_name_for_scalar_functions. I think this approach is viable. A few tests are failing (expectedly) that still need fixing.

Unclear to me is if we should also change the HTTP output (otherwise we have to deal with jdbc 0/1 diffs in the tests that assert column names and use functions)

Probably also needs a changes entry.

Happy to hand this back to you for finalization.

@matriv matriv self-assigned this May 15, 2025
@matriv matriv force-pushed the j/column-name-for-scalars branch from a8eee1b to dd66b7b Compare May 15, 2025 10:27
@matriv
Copy link
Contributor

matriv commented May 15, 2025

I've pushed a wrapup version, but now we have the problem that functions (e.g. aggregates), which don't collapse to a literal but need to be calculated based on the query and cols, are also returned with only the function name (e.g.: max instead of max(age)).

This conforms with postgres:

postgres=# select min(a), min(a+1) from t;
 min | min 
-----+-----
   1 |   2
(1 row)

But I'm thinking of 2 variations:

  1. return the whole function call if there is at least one arg:
@Override
protected Void visitSingleColumn(SingleColumn node, SelectAnalysis context) {
    Expression expression = node.getExpression();
    Symbol symbol = context.toSymbol(expression);
    String alias = node.getAlias();
    if (alias == null) {
        String colName = OutputNameFormatter.format(expression);
        String name =  null;
        if (expression instanceof FunctionCall fn) {
            if (fn.getArguments().isEmpty()) {
                name = fn.getName().getSuffix();
            } else {
                name = colName;
            }
        }
        context.add(ColumnIdent.of(colName), symbol, name);
    } else {
        context.add(ColumnIdent.of(alias), new AliasSymbol(alias, symbol), alias);
    }
    return null;
}
  1. Inspect the Symbol of the relevant output and return the outputName only if the output is a Literal?

What do you think?

@matriv matriv force-pushed the j/column-name-for-scalars branch from dd66b7b to 594cd71 Compare May 15, 2025 14:49
@matriv
Copy link
Contributor

matriv commented May 15, 2025

  1. Changes entry needs to be adjusted still.
  2. Regarding arithmetic operators and comparison operators we return the whole expression:
https://github.com/crate/crate/pull/17858/files#diff-fbb4d9d2f81fbfe7deb26e9f27e3f100c3580b4d76cdd98108ecd8bd7da88230R181

where as postgres returns:

postgres=# select (1+2) * 3;
 ?column? 
----------
        9
(1 row)


postgres=# select (1 >23);
 ?column? 
----------
 f
(1 row)

The same situation holds for other expressions like IN/ANY/CASE/IS NULL, etc.

  1. For casts we return:
https://github.com/crate/crate/pull/17858/files#diff-ed2a6369cdf047f097bfca45fa6b4367d8a969567bdbfdfc3fd744d55af1a516R782

and postgres returns the target data type:

postgres=# select cast(1 as varchar(10));
 varchar 
---------
 1
(1 row)

What should we do for 2 and 3?
For 2. I guess it's fine to leave as is (return the whole expression) and explain in the changes entry.

For 3? I'm not sure, maybe do what postgres does?

@matriv
Copy link
Contributor

matriv commented May 15, 2025

Another issue with the current approach is that if a FunctionCall is nested within let's say an arithmetic operator, then we still print out the whole function call. e.g.:

cr> select (1 + abs(a)) from tbl;
+--------------+
| (1 + abs(a)) |
+--------------+
|            6 |
|            2 |
|            3 |
|            4 |
|            5 |
+--------------+
SELECT 5 rows in set (1.017 sec)

Another option (regarding point 2.) would be to return ?column? as postgres does for anything else than a FunctionCall so the above issue is resolved.

@mfussenegger
Copy link
Member Author

Regarding arithmetic operators and comparison operators we return the whole expression:

That's okay for now.

Another issue with the current approach is that if a FunctionCall is nested within let's say an arithmetic operator, then we still print out the whole function call. e.g.:

Should also be fine.

Another option (regarding point 2.) would be to return ?column? as postgres

We can do that if we encounter a concrete compatibility issue. Until then I'd keep this limited to handling the FunctionCall in a special way.

@matriv matriv marked this pull request as ready for review May 19, 2025 15:09
@matriv matriv self-requested a review May 19, 2025 15:09
@matriv matriv removed their request for review May 19, 2025 15:24
@matriv matriv force-pushed the j/column-name-for-scalars branch from 594cd71 to 4f3b01b Compare May 19, 2025 15:26
@matriv matriv requested a review from seut May 19, 2025 15:27
Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good in general, thx!

assertThat(response.readShort()).isEqualTo((short) 1);
assertThat(PostgresWireProtocol.readCString(response)).isEqualTo("($1 = ANY([1, 2, 3]))");
assertThat(PostgresWireProtocol.readCString(response)).isEqualTo("($1 IN (1, 2, 3))");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these changes related?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your link really pointing to the right code?
Anyway, thanks for the explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matriv matriv requested a review from seut May 19, 2025 16:08
Copy link
Member

@seut seut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 But lets drop the last commit, my fault, sorry about this!

Copy link
Contributor

mergify bot commented May 19, 2025

This pull request has been removed from the queue for the following reason: pull request dequeued.

Pull request #17858 has been dequeued, merge conditions unmatch:

  • label=ready-to-merge
  • any of [🛡 GitHub branch protection]:
    • check-neutral = ci/jenkins/pr_tests
    • check-skipped = ci/jenkins/pr_tests
    • check-success = ci/jenkins/pr_tests
  • #approved-reviews-by >= 1 [🛡 GitHub branch protection]
  • #approved-reviews-by>=1
  • #changes-requested-reviews-by = 0 [🛡 GitHub branch protection]

You should look at the reason for the failure and decide if the pull request needs to be fixed or if you want to requeue it.
If you do update this pull request, it will automatically be requeued once the queue conditions match again.
If you think this was a flaky issue instead, you can requeue the pull request, without updating it, by posting a @mergifyio requeue comment.

matriv and others added 2 commits May 19, 2025 22:03
To conform with Postgres behaviour return the function name, without
including the complete function call with the args, as output column
name. Also, previously, for scalars that can be collapsed to a literal
return value, we used that value for the output column name instead of
the function name.

For more details on the behavior change see the breaking change entry
in release notes of 6.0.0.

Closes: #17393

Co-authored-by: Mathias Fussenegger <f.mathias@zignar.net>
@matriv matriv force-pushed the j/column-name-for-scalars branch from df78bee to ea167cf Compare May 19, 2025 19:03
@matriv
Copy link
Contributor

matriv commented May 19, 2025

retest this please

@matriv matriv added the ready-to-merge Let Mergify merge the PR once approved and checks pass label May 19, 2025
@mergify mergify bot merged commit 585adcd into master May 19, 2025
23 checks passed
@mergify mergify bot deleted the j/column-name-for-scalars branch May 19, 2025 19:24
@matriv matriv changed the title Return name of function as output name via PG Return name of function as output name May 19, 2025
@amotl
Copy link
Member

amotl commented May 19, 2025

That is an excellent improvement that will boost ecosystem compatibility on a very important detail. Thank you so much! 💯

matriv added a commit that referenced this pull request May 28, 2025
Change `AliasedAnalyzedRelation` to return the outputNames() from its
underlining `relation` to keep any aliases set there. On top,
when handling such `AlisedAnalyzedRelation`s use the `outputNames()` to
retrieve a name if the list is not null and add special handling to
`TableFunctionRelation` to return `"unnest"` name if there is no alias
defined.

Follows: #17858
matriv added a commit that referenced this pull request Jun 2, 2025
Change `AliasedAnalyzedRelation` to return the outputNames() from its
underlining `relation` to keep any aliases set there. On top,
when handling such `AlisedAnalyzedRelation`s use the `outputNames()` to
retrieve a name if the list is not null and add special handling to
`TableFunctionRelation` to return `"unnest"` name if there is no alias
defined.

Follows: #17858
matriv added a commit that referenced this pull request Jun 2, 2025
Change `AliasedAnalyzedRelation` to use `outputNames()
underlining `relation` to keep any aliases set there.

Follows: #17858
matriv added a commit that referenced this pull request Jun 3, 2025
Change `AliasedAnalyzedRelation` to use `outputNames()
underlining `relation` to keep any aliases set there.

Follows: #17858
mergify bot pushed a commit that referenced this pull request Jun 3, 2025
Change `AliasedAnalyzedRelation` to use `outputNames()
underlining `relation` to keep any aliases set there.

Follows: #17858
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed ready-to-merge Let Mergify merge the PR once approved and checks pass
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Column name for scalar function call should be the function name, not the evaluated literal (JDBC compat issue)
5 participants