Skip to content

Conversation

practicalswift
Copy link
Contributor

Replace use of locale dependent atoi(…) with locale-independent std::from_chars(…) (C++17).

@laanwj
Copy link
Member

laanwj commented Nov 22, 2020

Note that we have a function ParseInt32 (as well as 64 and UInt variants) for this and I once tried to use it in more places, see #17385.

However these didn't turn out to be actually locale-independent. Maybe this can replace them eventually.

@laanwj
Copy link
Member

laanwj commented Nov 22, 2020

locale-independent std::from_chars(…) (C++17).

This is good to hear at least. From https://en.cppreference.com/w/cpp/utility/from_chars#Notes :

Unlike other parsing functions in C++ and C libraries, std::from_chars is locale-independent, non-allocating, and non-throwing.

Concept ACK.

@DrahtBot
Copy link
Contributor

DrahtBot commented Nov 22, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #15936 (Unify bitcoin-qt and bitcoind persistent settings by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@laanwj
Copy link
Member

laanwj commented Nov 22, 2020

I think we need to make a choice here (either one or both):

  • Make an emulation of all broken and surprising atoi behavior, but call it something else than ToIntegral, something like LocaleIndependentAtoi, make it clear it should not be used in new code, it's only meant for preserving atoi behavior for backward compatibility reasons.
  • Make a sane integer parsing function that can eventually replace Parse[U]IntXX or their contents in a locale-independent way. I'm fine with the name then.

@practicalswift
Copy link
Contributor Author

practicalswift commented Nov 22, 2020

@laanwj

I think we need to make a choice here (either one or both):

  • Make an emulation of all broken and surprising atoi behavior, but call it something else than ToIntegral, something like LocaleIndependentAtoi, make it clear it should not be used in new code, it's only meant for preserving atoi behavior for backward compatibility reasons.
  • Make a sane integer parsing function that can eventually replace Parse[U]IntXX or their contents in a locale-independent way. I'm fine with the name then.

Good point. I thought about those options as well. I think we should do both. This PR is meant as a pure refactoring: it is not meant to change any behaviour that is defined by atoi.

I've now made it more clear that this PR is meant as a quirk-by-quirk compatible replacement for atoi by calling the function LocaleIndependentAtoi :)

Good point about naming: we should keep the name ToIntegral for a future sane version that doesn't emulate atoi or any other legacy function.

@laanwj
Copy link
Member

laanwj commented Nov 22, 2020

Okay, agree. I guess it could replace atoi as well as atoi64 in that case (as it's parametrized on type)?

@practicalswift
Copy link
Contributor Author

@laanwj Done! I've now replaced all instances of atoi64(s) with LocaleIndependentAtoi<int64_t>(s).

@laanwj
Copy link
Member

laanwj commented Nov 23, 2020

@laanwj Done! I've now replaced all instances of atoi64(s) with LocaleIndependentAtoi<int64_t>(s).

Thanks, looks good to me now except the documentation nit.

@practicalswift practicalswift force-pushed the remove-atoi branch 2 times, most recently from f14668e to 04271a9 Compare November 23, 2020 15:50
@practicalswift practicalswift changed the title Replace use of locale dependent atoi(…) with locale-independent std::from_chars(…) (C++17) util: Replace use of locale dependent atoi(…) with locale-independent std::from_chars(…) (C++17) Nov 24, 2020
maflcko pushed a commit that referenced this pull request Nov 24, 2020
… leading +/-/0:s

05c1095 test: Add testing of ParseInt/ParseUInt edge cases with leading +/-/0:s (practicalswift)

Pull request description:

  Add testing of `ParseInt`/`ParseUInt` edge cases with leading `+`/`-`/`0`:s.

  Context: While working on #20457 and #20452 I noticed some edge cases which our unit tests are currently not covering.

ACKs for top commit:
  MarcoFalke:
    review ACK 05c1095
  laanwj:
    Code review ACK 05c1095
  jonatack:
    ACK 05c1095
  promag:
    Code review ACK 05c1095.

Tree-SHA512: bdfb94d8fa0293512dbba89907cb6dd0f8b1418d878267dd6d49c8c397a0e5b9714441345565d41a6a909a1cda052ef7cccece822f355ff604fcf85f2dc8136f
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Nov 24, 2020
…es with leading +/-/0:s

05c1095 test: Add testing of ParseInt/ParseUInt edge cases with leading +/-/0:s (practicalswift)

Pull request description:

  Add testing of `ParseInt`/`ParseUInt` edge cases with leading `+`/`-`/`0`:s.

  Context: While working on bitcoin#20457 and bitcoin#20452 I noticed some edge cases which our unit tests are currently not covering.

ACKs for top commit:
  MarcoFalke:
    review ACK 05c1095
  laanwj:
    Code review ACK 05c1095
  jonatack:
    ACK 05c1095
  promag:
    Code review ACK 05c1095.

Tree-SHA512: bdfb94d8fa0293512dbba89907cb6dd0f8b1418d878267dd6d49c8c397a0e5b9714441345565d41a6a909a1cda052ef7cccece822f355ff604fcf85f2dc8136f
@laanwj
Copy link
Member

laanwj commented Nov 25, 2020

I don't get the CI error. Is charconv non-standard in some way?

In file included from primitives/transaction.cpp:10:0:
./util/strencodings.h:16:10: fatal error: charconv: No such file or directory
 #include <charconv>
          ^~~~~~~~~~
compilation terminated.

…from_chars(…) (C++17)

test: Add test cases for LocaleIndependentAtoi

fuzz: Assert legacy atoi(s) == LocaleIndependentAtoi<int>(s)

fuzz: Assert legacy atoi64(s) == LocaleIndependentAtoi<int64_t>(s)
@jonatack
Copy link
Member

Concept ACK

@laanwj
Copy link
Member

laanwj commented Sep 30, 2021

Code review ACK 4343f11

Copy link
Member

@jonatack jonatack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review ACK 4343f11

// LocaleIndependentAtoi is provided for backwards compatibility reasons.
//
// New code should use the ParseInt64/ParseUInt64/ParseInt32/ParseUInt32 functions
// which provide parse error feedback.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really important comment. I hope we can get rid of these functions entirely at some point. The implicit error handling behavior of atoi and atoi64 is pretty much always undesirable.
(so I would normally comment "these function names are long and clunky, please shorten them" but it's good in this case, we want using them to be ugly 😄 )

@laanwj laanwj merged commit cdb4dfc into bitcoin:master Oct 4, 2021
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Oct 4, 2021
Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

concept ack

@maflcko
Copy link
Member

maflcko commented Oct 5, 2021

Follow up in #23184

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

withdrawn my comments

kwvg pushed a commit to kwvg/dash that referenced this pull request Oct 12, 2021
…es with leading +/-/0:s

05c1095 test: Add testing of ParseInt/ParseUInt edge cases with leading +/-/0:s (practicalswift)

Pull request description:

  Add testing of `ParseInt`/`ParseUInt` edge cases with leading `+`/`-`/`0`:s.

  Context: While working on bitcoin#20457 and bitcoin#20452 I noticed some edge cases which our unit tests are currently not covering.

ACKs for top commit:
  MarcoFalke:
    review ACK 05c1095
  laanwj:
    Code review ACK 05c1095
  jonatack:
    ACK 05c1095
  promag:
    Code review ACK 05c1095.

Tree-SHA512: bdfb94d8fa0293512dbba89907cb6dd0f8b1418d878267dd6d49c8c397a0e5b9714441345565d41a6a909a1cda052ef7cccece822f355ff604fcf85f2dc8136f
jamesob added a commit to jamesob/bitcoin that referenced this pull request Dec 22, 2021
The new locale-independent atoi64 method introduced in bitcoin#20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that, meant to spur discussion on how to handle
such an incompatibility.

The change as committed in bitcoin#20542 may break some scripts which invoke
bitcoind, but more than that may have consensus implications as this
deserialization mechanism is used in CScript::ParseScript. I think this
means it's possible that bitcoin#20542 could have been an accidental soft fork?
jamesob added a commit to jamesob/bitcoin that referenced this pull request Dec 22, 2021
Introduce LocaleIndependentAtoi64 which behaves the same way that
previous versions of Bitcoin Core has when faced with under- and
overflow.

This behavior was implicitly changed in bitcoin#20452, but has not yet
been included in a release.

Attempts to use LocaleIndependentAtoi for int64_t return values
will result in a compilation error.
jamesob added a commit to jamesob/bitcoin that referenced this pull request Dec 22, 2021
Introduce LocaleIndependentAtoi64 which behaves the same way that
previous versions of Bitcoin Core has when faced with under- and
overflow.

This behavior was implicitly changed in bitcoin#20452, but has not yet
been included in a release.

Attempts to use LocaleIndependentAtoi for int64_t return values
will result in a compilation error.
jamesob added a commit to jamesob/bitcoin that referenced this pull request Jan 3, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that, meant to spur discussion on how to handle
such an incompatibility.

Introduce LocaleIndependentAtoi64 which behaves the same way that
previous versions of Bitcoin Core has when faced with under- and
overflow.

This behavior was implicitly changed in bitcoin#20452, but has not yet
been included in a release.

Attempts to use LocaleIndependentAtoi for int64_t return values
will result in a compilation error.
jamesob added a commit to jamesob/bitcoin that referenced this pull request Jan 4, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 behaves
differently for values passed which are greater than the uint64_t max.
This commit is proof of that, meant to spur discussion on how to handle
such an incompatibility.

Introduce LocaleIndependentAtoi64 which behaves the same way that
previous versions of Bitcoin Core has when faced with under- and
overflow.

This behavior was implicitly changed in bitcoin#20452, but has not yet
been included in a release.

Attempts to use LocaleIndependentAtoi for int64_t return values
will result in a compilation error.
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 11, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case.

Co-authored-by: James O'Beirne <james.obeirne@pm.me>
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 11, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: James O'Beirne <james.obeirne@pm.me>
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 11, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: James O'Beirne <james.obeirne@pm.me>
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 11, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: James O'Beirne <james.obeirne@pm.me>
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 11, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: James O'Beirne <james.obeirne@pm.me>
ryanofsky added a commit to ryanofsky/bitcoin that referenced this pull request Jan 12, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
maflcko pushed a commit that referenced this pull request Jan 12, 2022
b5c9bb5 util: Restore GetIntArg saturating behavior (James O'Beirne)

Pull request description:

  The new locale-independent atoi64 method introduced in #20452 parses large integer values higher than maximum representable value as 0 instead of the maximum value, which breaks backwards compatibility. This commit restores compatibility and adds test coverage for this case in terms of the related GetIntArg and strtoll functions.

  Specifically, command line or bitcoin.conf integer values greater than `9223372036854775807` (`2**63-1`) used to be parsed as `9223372036854775807` before #20452. Then #20452 caused them to be parsed as `0`. And after this PR they will be parsed as `9223372036854775807` again.

  This change is a stripped-down alternative version of #23841 by jamesob

ACKs for top commit:
  jamesob:
    Github ACK b5c9bb5
  vincenzopalazzo:
    ACK b5c9bb5
  MarcoFalke:
    review ACK b5c9bb5 🌘

Tree-SHA512: 4e8abdbabf3cf4713cf5a7c5169539159f359ab4109a4e7e644cc2e9b2b0c3c532fad9f6b772daf015e1c5340ce59280cd9a41f2730afda6099cbf636b7d23ae
rebroad pushed a commit to rebroad/bitcoin that referenced this pull request Feb 3, 2022
The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
@bitcoin bitcoin locked and limited conversation to collaborators Aug 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants