Skip to content

Conversation

ryanofsky
Copy link
Contributor

@ryanofsky ryanofsky commented Jan 11, 2022

The new locale-independent atoi64 method introduced in #20452 parses large integer values higher than maximum representable value as 0 instead of the maximum value, which breaks backwards compatibility. This commit restores compatibility and adds test coverage for this case in terms of the related GetIntArg and strtoll functions.

Specifically, command line or bitcoin.conf integer values greater than 9223372036854775807 (2**63-1) used to be parsed as 9223372036854775807 before #20452. Then #20452 caused them to be parsed as 0. And after this PR they will be parsed as 9223372036854775807 again.

This change is a stripped-down alternative version of #23841 by jamesob

@ryanofsky
Copy link
Contributor Author

Updated 4a15ddc -> 6184764 (pr/green.1 -> pr/green.2, compare) fixing a comment typo.

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK, if jamesob is ok that you are taking this over.

Nit: Could make sense to adjust the title to clarify:

  • This only affects GetIntArg, as the other integers don't overflow during normal use
  • The new (and old) behavior is saturation

Suggested title: "Restore GetIntArg saturating behavior"

@DrahtBot
Copy link
Contributor

DrahtBot commented Jan 11, 2022

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

No conflicts as of last run.

@jb55
Copy link
Contributor

jb55 commented Jan 11, 2022

Concept ACK

@ryanofsky ryanofsky changed the title Restore atoi compatibility with old versions of Bitcoin Core Restore GetIntArg saturating behavior Jan 11, 2022
@jamesob
Copy link
Contributor

jamesob commented Jan 11, 2022

Concept ACK, if jamesob is ok that you are taking this over.

I will never not be honored by Ryan Ofsky taking over one of my changes.

But here's the thing - the reason I didn't go this route in the original PR is because it's doing what I was trying to avoid in the first place, which is changing behavior. This is evidenced by the changes to existing tests that accompany this PR.

IMO we shouldn't change number parsing without some kind of deprecation warning. But I guess it's okay if we can make some good arguments that we're reasonably sure there are no existing uses of non-int64 atoi parsing that will fail if saturation behavior changes?

@ryanofsky
Copy link
Contributor Author

Updated 6184764 -> d5c0e2d (pr/green.2 -> pr/green.3, compare) updating fuzz test, and fixing include error https://cirrus-ci.com/task/4726914576285696 and MSVC error https://cirrus-ci.com/task/5712076994772992

@jamesob
Copy link
Contributor

jamesob commented Jan 11, 2022

But here's the thing - the reason I didn't go this route in the original PR is because it's doing what I was trying to avoid in the first place, which is changing behavior.

Oh, I guess atoi out of range behavior is UB (unlike stoll) so maybe this isn't actually changing any defined behavior, in which case I'm ACK.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Jan 11, 2022

But here's the thing - the reason I didn't go this route in the original PR is because it's doing what I was trying to avoid in the first place, which is changing behavior. This is evidenced by the changes to existing tests that accompany this PR.

It's definitely changing behavior relative to #20452 and the new tests added in #20452. I'm actually not sure if this is changing behavior from before #20452. If it is changing behavior from before #20452, I think it's just platform specific behavior.

@ryanofsky
Copy link
Contributor Author

Updated d5c0e2d -> 389c47a (pr/green.3 -> pr/green.4, compare) to fix CI errors about unused variable in fuzz test https://cirrus-ci.com/task/6214569847685120, https://cirrus-ci.com/task/6724743242973184, https://cirrus-ci.com/task/5598843336130560. This is an expedient fix since I'm not totally sure what kind of fuzz coverage is useful. Maybe it is good to keep compare three integer parsing functions against each other instead of only comparing two.

@ryanofsky ryanofsky changed the title Restore GetIntArg saturating behavior util: Restore GetIntArg saturating behavior Jan 11, 2022
@jamesob
Copy link
Contributor

jamesob commented Jan 11, 2022

ACK pending CI pass

@ryanofsky
Copy link
Contributor Author

@JeremyRubin
Copy link
Contributor

i dont know enough about string parsing to be super useful, but based on the breaking changes analysis & trusting the implementations are OK this approach has my concept ACK.

The new locale-independent atoi64 method introduced in bitcoin#20452 parses
large integer values higher than maximum representable value as 0
instead of the maximum value, which breaks backwards compatibility.
This commit restores compatibility and adds test coverage for this case
in terms of the related GetIntArg and strtoll functions.

Co-authored-by: Ryan Ofsky <ryan@ofsky.org>
@ryanofsky
Copy link
Contributor Author

Updated 1b0b1dd -> b5c9bb5 (pr/green.5 -> pr/green.6, compare) dropping fuzz check to avoid depending on undefined atoi behavior https://cirrus-ci.com/task/5686396143796224

@maflcko
Copy link
Member

maflcko commented Jan 12, 2022

Could update OP to clarify how the intermediate behavior changes cancel each other out?

(initial state) ----------------> (midstate) --------------------> (end result)
atoi UB         ----- 20452 ----> atoi 0     ----- this pull ----> atoi saturate
atoi64 saturate ----- 20452 ----> atoi64 0   ----- this pull ----> atoi64 saturate

Copy link

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK b5c9bb5

@ryanofsky
Copy link
Contributor Author

Could update OP to clarify how the intermediate behavior changes cancel each other out?

I'm not sure I followed the diagram, but I added a sentence to the OP about what previous behavior was and how this restores it. Feel free to edit OP directly if you were asking for something different or want to change anything.

@jamesob
Copy link
Contributor

jamesob commented Jan 12, 2022

Github ACK b5c9bb5

Copy link
Member

@maflcko maflcko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review ACK b5c9bb5 🌘

Show signature

Signature:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

review ACK b5c9bb5cb9f4a8db57b33ef7399310c7d6de5822 🌘
-----BEGIN PGP SIGNATURE-----

iQGzBAEBCgAdFiEE+rVPoUahrI9sLGYTzit1aX5ppUgFAlwqrYAACgkQzit1aX5p
pUiLpQv8CcSnEdE4GY3Ka1zv9fy6FGmfW/RdO8CCT2Kazcf3FeGQ35f5YObu2wQF
dTgojk3/rzZwvwXEmpLx1zxKbG19wKDXz7J+TJGMM9QOu+enxHADKLGJt3ejGUQ8
IDco8ien8jsK0/qBDyVReR2mRuC1eSXaxTjiNhJAJ5j47WzTjmhUJZHeNYznxtIT
ATYNOU4jOZLpTdVLm/WS/Vg543PMU+tzFpElbSORgbO3M4h7stpopMt5liYwSsj+
YoEAcWQ7qqVFFwmtymR0AdzmwxvfQwMVtocCEZNt4/tyzGSYk0gZrx7yqm3hK9li
YuWQ29EY64S3OdthWRE/Y1WY8b5fkU9oAJ5SNXGwGuD08TFiYrOWKtPKfneW5sAE
3pE4zrBb/pmt+chsw9bBqBLs8NhPv0Z8yh+6ykChns7wR2QPis409mtaJD5tfZB/
TP26N4JawuHUTWdtaG3f4yY0SsdkgWQGmj3fAKd8d9AkMxEbFSKhRM9JByYIePbn
mJ9EiloO
=hA0/
-----END PGP SIGNATURE-----

@maflcko maflcko added this to the 23.0 milestone Jan 12, 2022
@maflcko maflcko merged commit 290ff5e into bitcoin:master Jan 12, 2022
@maflcko
Copy link
Member

maflcko commented Jan 12, 2022

There is still another behaviour change:

123a would be parsed as 123.

Now it is mapped to 0.

Edit: Wrong. See discussion below.

@jamesob
Copy link
Contributor

jamesob commented Jan 12, 2022

@MarcoFalke that is due to #20452, correct?

@maflcko
Copy link
Member

maflcko commented Jan 12, 2022

Yes, everything is due to 20452

@ryanofsky
Copy link
Contributor Author

There is still another behaviour change:

123a would be parsed as 123.

Now it is mapped to 0.

This is surprising. std::from_chars "expects the pattern identical to the one used by std::strtol" and returns the "first character not matching the pattern." And std::strtol accepts a pattern that "takes as many characters as possible to form a valid base-n (where n=base) integer number representation". So I'd expect 123a to be parsed as 123. But we should have test coverage for this, whether or not there is a bug.

@maflcko
Copy link
Member

maflcko commented Jan 12, 2022

strtol has error reporting for this case, but atoi64 disables that by setting nullptr. I expect you'll have to call from_chars twice on the string (the second time passing in the returned string) to emulate that.

@ryanofsky
Copy link
Contributor Author

ryanofsky commented Jan 12, 2022

strtol has error reporting for this case

Where do you see that strtol would return an error for this case? That seems counter to the documentation https://en.cppreference.com/w/cpp/string/byte/strtol#Return_value

I'm also not able to reproduce a bug. For example the following test passes on master (290ff5e)

diff --git a/src/test/getarg_tests.cpp b/src/test/getarg_tests.cpp
index d5142c8d748..5286aa6d812 100644
--- a/src/test/getarg_tests.cpp
+++ b/src/test/getarg_tests.cpp
@@ -154,6 +154,9 @@ BOOST_AUTO_TEST_CASE(intarg)
     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 0), 11);
     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 12);
 
+    ResetArgs("-foo=123a");
+    BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 456), 123);
+
     ResetArgs("-foo=NaN -bar=NotANumber");
     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-foo", 1), 0);
     BOOST_CHECK_EQUAL(m_local_args.GetIntArg("-bar", 11), 0);
make -C src test/test_bitcoin && src/test/test_bitcoin -l test_suite -t getarg_tests

@maflcko
Copy link
Member

maflcko commented Jan 12, 2022

Oh, right. I had a typo in my own test and it didn't show the wrong line due to the std::map indirection, but now it passes:

diff --git a/src/test/util_tests.cpp b/src/test/util_tests.cpp
index 20d27a237d..280ea7cbd3 100644
--- a/src/test/util_tests.cpp
+++ b/src/test/util_tests.cpp
@@ -1668,6 +1668,9 @@ BOOST_AUTO_TEST_CASE(test_LocaleIndependentAtoi)
         {"-9223372036854775808", -9'223'372'036'854'775'807LL - 1LL},
         {"9223372036854775807", 9'223'372'036'854'775'807},
         {"9223372036854775808", std::numeric_limits<int64_t>::max()},
+        {"+123a", 123},
+        {"123a", 123},
+        {"-123a", -123},
         {"+-", 0},
         {"0x1", 0},
         {"ox1", 0},

sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jan 12, 2022
@ryanofsky
Copy link
Contributor Author

<luke-jr> ryanofsky: what is the meaning of "green" in #24041 ?

@luke-jr, it's just a nonsense branch name from frogs are green tangent in previous PR

@bitcoin bitcoin locked and limited conversation to collaborators Jan 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants