Skip to content

Conversation

Maxxen
Copy link
Member

@Maxxen Maxxen commented Nov 14, 2022

Closes #5112

This adds underscores as optional digit separators in numeric literals, both for integers and reals. They can be used in both the integer and decimal part, however, they are not valid in exponents.

A literal component can not start or end with an underscore either.

Examples, see test/sql/parser/digit_separators.test

I had to modify the fast_float.h library to also skip over underscores. I don't know if this is too invasive of a change but the only other option is to do another allocaiton/filter step in the cast function before invoking the parsing, which seems wasteful.

@Maxxen
Copy link
Member Author

Maxxen commented Nov 14, 2022

Ok, @Tishj pointed out that underscores probably will work for exponents in integers, but it won't work for floats... I'll add tests for it later and maybe update the float case as well to be consistent.

@Mytherin Mytherin changed the base branch from master to feature November 14, 2022 18:59
Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have been playing around with this for a bit and found the following issues:

  • Double-casting does not seem to work correctly in all cases:
D select 0____0_____0_____________1_____0__________0000_00_0__0.0 as a;
┌────────┐
│   a    │
│ double │
├────────┤
│    0.0 │
└────────┘
D select '1_______________________________________0.0'::double;
┌───────────────────────────────────────────────────────────────┐
│ CAST('1_______________________________________0.0' AS DOUBLE) │
│                            double                             │
├───────────────────────────────────────────────────────────────┤
│                                                           1.0 │
└───────────────────────────────────────────────────────────────┘
  • Selected decimal width appears to be affected by underscores:
D select 1_0.0;
┌──────────────┐
│     10.0     │
│ decimal(4,1) │
├──────────────┤
│         10.0 │
└──────────────┘
D select 1_______0.0;
┌───────────────┐
│     10.0      │
│ decimal(10,1) │
├───────────────┤
│          10.0 │
└───────────────┘

@@ -0,0 +1,175 @@
# name: test/sql/parser/digit_separators.test
# group: [parser]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we enable verification for this test?

Copy link
Collaborator

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Nice work hacking around all of the numeric casting code. Some comments below:

# group: [parser]

query I
SELECT 1__2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python only allows individual underscores, not multiple underscores following each other, e.g. this would throw an error in Python:

>>> 1__2
  File "<stdin>", line 1
    1__2
     ^
SyntaxError: invalid decimal literal

I'm not too opposed to keeping support for multiple subsequent underscores but it does allow for constructing very unreadable numbers.

if (pos == start_digit) {
return false;
}
while (buf[pos] == '_') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not check the length as well to ensure we don't read out-of-bounds?

if (pos == start_pos) {
return false;
}
while (buf[pos] == '_') {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here - should we not check the length here?

@Mytherin Mytherin changed the base branch from feature to master December 10, 2022 12:17
@Mytherin
Copy link
Collaborator

Closing this for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for digit separators
2 participants