-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feature: Allow underscores as digit separators in numeric literals #5338
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Ok, @Tishj pointed out that underscores probably will work for exponents in integers, but it won't work for floats... I'll add tests for it later and maybe update the float case as well to be consistent. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have been playing around with this for a bit and found the following issues:
- Double-casting does not seem to work correctly in all cases:
D select 0____0_____0_____________1_____0__________0000_00_0__0.0 as a;
┌────────┐
│ a │
│ double │
├────────┤
│ 0.0 │
└────────┘
D select '1_______________________________________0.0'::double;
┌───────────────────────────────────────────────────────────────┐
│ CAST('1_______________________________________0.0' AS DOUBLE) │
│ double │
├───────────────────────────────────────────────────────────────┤
│ 1.0 │
└───────────────────────────────────────────────────────────────┘
- Selected decimal width appears to be affected by underscores:
D select 1_0.0;
┌──────────────┐
│ 10.0 │
│ decimal(4,1) │
├──────────────┤
│ 10.0 │
└──────────────┘
D select 1_______0.0;
┌───────────────┐
│ 10.0 │
│ decimal(10,1) │
├───────────────┤
│ 10.0 │
└───────────────┘
@@ -0,0 +1,175 @@ | |||
# name: test/sql/parser/digit_separators.test | |||
# group: [parser] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we enable verification for this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Nice work hacking around all of the numeric casting code. Some comments below:
# group: [parser] | ||
|
||
query I | ||
SELECT 1__2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python only allows individual underscores, not multiple underscores following each other, e.g. this would throw an error in Python:
>>> 1__2
File "<stdin>", line 1
1__2
^
SyntaxError: invalid decimal literal
I'm not too opposed to keeping support for multiple subsequent underscores but it does allow for constructing very unreadable numbers.
if (pos == start_digit) { | ||
return false; | ||
} | ||
while (buf[pos] == '_') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not check the length as well to ensure we don't read out-of-bounds?
if (pos == start_pos) { | ||
return false; | ||
} | ||
while (buf[pos] == '_') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - should we not check the length here?
Closing this for now |
Closes #5112
This adds underscores as optional digit separators in numeric literals, both for integers and reals. They can be used in both the integer and decimal part, however, they are not valid in exponents.
A literal component can not start or end with an underscore either.
Examples, see test/sql/parser/digit_separators.test
I had to modify the
fast_float.h
library to also skip over underscores. I don't know if this is too invasive of a change but the only other option is to do another allocaiton/filter step in the cast function before invoking the parsing, which seems wasteful.