-
Notifications
You must be signed in to change notification settings - Fork 37.7k
WIP: refactor: Use our own integer parsing/formatting everywhere #17385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Also removes a use of a locale-dependent function.
Replace the direct use of `stoul` in dbwrapper.cpp. This removes a locale dependent exception.
Replace the use of atoi in CleanupBlockRevFiles with ParseUInt32 to handle parse errors and remove direct use of a locale-dependent function.
Replace use of function `snprintf` with our own, locale-independent function `strprintf`.
Replace the use of locale-dependent function strtol with our own error-checking integer parsing function `ParseUInt32`.
Replace use of atoi with `ParseInt64`, whose output is at least well-defined. Concretely, the behaviour should not change.
Replace use of `atoi` in torcontrol for parsing reply code with `ParseUInt32`, adding error checking. Also, change type of `code` to `uint32_t`, it is never negative does not need to be a signed integer.
Replace use of atoi64 and circuitois way to test if something is a valid 64-bit integer, with the use of our own function `ParseInt64`.
Replace the use of atoi64 with error and bound checking `ParseUInt32` - after all, the result is being assigned to an unsigned int.
Replace the direct use of atoi64 with error-checking ParseInt64.
Replace the use of atoi64 with error-checking ParseInt64. Unfortunately, there is currently no way to return a parse error, so return the default in that case.
Replace use of `atoi64` with error-checking `ParseInt64`.
Replace direct use of atoi64 with ParseUInt64 (as the result is an unsigned timestamp).
2b5ed2f
to
88e8307
Compare
Wow! That was quick! :) Strong Concept ACK Thanks for taking the time to get rid of those once and for all :) |
Test failure in
Do we really support |
I'll take a look at this. We could make Any case, will first debug and see what causes reported |
We ideally need a way to signal parse errors from
In that case couldn't you simply do
I agree, I'm tempted to just drop that commit, though it means |
This was causing a lot of test cases not not be very meaningful because multiple configuration options were combined into one line. The changes in test output with this fix make sense and look like: ```diff - testnet=1 regtest=1 || test + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at most one. ``` Issue was reported and debugged by Wladimir J. van der Laan <laanwj@protonmail.com> in bitcoin#17385 (comment)
Concept ACK 88e8307 |
This is way too lenient.
Discussion seems somewhat offtopic here, maybe discuss argument parsing behavior more in #16545.
I could be wrong but I think it would actually be less work, and friendlier to both users and developers to validate settings on initialization rather than use. #16545 is intended to move in this direction.
For int and bool arguments, I think it's generally reasonable to treat
I think it would be best to save both d78e7be and afb5809 for another PR and not include them here. These are user facing changes which could use release notes and better test coverage. |
I'm fine with adding a release not if you think this is a significant user-facing change, and add a test for these two little functions, but I'd really prefer not to keep either the current behavior or code. |
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
My thinking is that there probably shouldn't be user-facing changes in a PR that's supposed to be a refactor, and that this PR is substantial enough with 17 commits, that pulling out 2 of them to a dedicated PR wouldn't make it worse off. But the user-facing changes are maybe more minor than I would have thought. With your two commits arguments such as |
Many of the changes in this PR make a subtle difference in the case of invalid input, due to adding strict parse error checking (I mention this in the OP). Maybe I shouldn't call it a refactor. But it's not really a bugfix or feature either.
Sure. But remove any and the cleanup in |
I'll review this tomorrow. I think all the commits here that remove uses of C functions to parse strings which are supposed to have a rigid format (numbers in filenames, test strings, descriptors, database properties) are great, and it's great to clean up this code and avoid misusing these C functions. But while I'll need to review the PR more carefully to be sure, I think the few changes here that can actually affect users should probably be done more carefully. For example, instead of taking a strangely formatted boolean value that was previously interpreted as Alternately, it it's possible to drop a few commits in this PR to make it a boring plain refactoring, that would be great, because we'd still be eliminating the majority of misused C function calls, and just adding one or two items to your "known violations" list that could be addressed in followups. |
3645e4c Add missing newline in util_ChainMerge test (Russell Yanofsky) Pull request description: This was causing a lot of test cases not not be very meaningful because multiple configuration options were combined into one line. The changes in test output with this fix make sense and look like: ```diff - testnet=1 regtest=1 || test + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at most one. ``` Issue was reported and debugged by Wladimir J. van der Laan <laanwj@protonmail.com> in #17385 (comment) <!-- *** Please remove the following help text before submitting: *** Pull requests without a rationale and clear improvement may be closed immediately. --> <!-- Please provide clear motivation for your patch and explain how it improves Bitcoin Core user experience or Bitcoin Core developer experience significantly: * Any test improvements or new tests that improve coverage are always welcome. * All other changes should have accompanying unit tests (see `src/test/`) or functional tests (see `test/`). Contributors should note which tests cover modified code. If no tests exist for a region of modified code, new tests should accompany the change. * Bug fixes are most welcome when they come with steps to reproduce or an explanation of the potential issue as well as reasoning for the way the bug was fixed. * Features are welcome, but might be rejected due to design or scope issues. If a feature is based on a lot of dependencies, contributors should first consider building the system outside of Bitcoin Core, if possible. * Refactoring changes are only accepted if they are required for a feature or bug fix or otherwise improve developer experience significantly. For example, most "code style" refactoring changes require a thorough explanation why they are useful, what downsides they have and why they *significantly* improve developer experience or avoid serious programming bugs. Note that code style is often a subjective matter. Unless they are explicitly mentioned to be preferred in the [developer notes](/doc/developer-notes.md), stylistic code changes are usually rejected. --> <!-- Bitcoin Core has a thorough review process and even the most trivial change needs to pass a lot of eyes and requires non-zero or even substantial time effort to review. There is a huge lack of active reviewers on the project, so patches often sit for a long time. --> ACKs for top commit: laanwj: ACK 3645e4c practicalswift: ACK 3645e4c -- diff looks correct Tree-SHA512: ca5bde9b9f553811d4827113f4880d15d7b8f4f1455b95bbf34c9a1512fdd53062f1a2133c50d9b54f94160a1ee77a54bc82681a5f3bf25d2b0d01f8a8e95165
Closing this until the arguments manager has a way to report errors to the user. I agree that needs to be done. It'd be fairly easy to add error return value to the ArgParse functions for bool and int, however there are so many call sites… and they all assume fast and loose error handling (ie none).
This would definitely be better. If you do this, please change the parsing function as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial ACK for checked commits below: all except 3. There seems to be a possible bug in the ParseMoney commit and I think two ArgsManager commits should be dropped and followed up in a separate PR (they could use improved error handling and probably release notes).
Could reopen this PR, or open a new one with the same changes. I can follow up later with another PR later if this stays dormant.
- f7f2472 util: Add error handling to sequence id in bitcoin-tx (1/17)
- df211cb refactor: Replace use of stoul in dbwrapper.cpp (2/17)
- f47450e refactor: Replace use of atoi in CleanupBlockRevFiles (3/17)
- 5cdaf4a refactor: Replace ad-hoc validity check and atoi in rpcconsole (4/17)
- 812c732 refactor: Replace use of snprintf with strprintf in dbwrapper_tests (5/17)
- a8f7502 refactor: Replace use of strtol in rest.cpp (6/17)
- d78e7be refactor: Replace use of atoi for bool parsing in system.cpp (7/17)
- a8ded10 refactor: Replace use of atoi in torcontrol (8/17)
- 6992e0d refactor: Remove now-unused atoi from util (9/17)
- e4aa1f7 refactor: Replace use of atoi64 in core_read (10/17)
- b7315cb refactor: Replace use of atoi64 in rpc/mining.cpp (11/17)
- 2fcdd16 refactor: Replace use of atoi64 in moneystr.cpp (12/17)
- afb5809 refactor: Replace use of atoi64 in ArgsManager::GetArg (13/17)
- 87a3f13 refactor: Replace use of atoi64 in wallet ReadOrderPos (14/17)
- f4218c2 refactor: Replace use of atoi64 in timesmart unserialize (15/17)
- 88e8307 refactor: Remove no-longer-used atoi64 from util (16/17)
- 55ba388 f: add TODO about returning parse error (17/17)
if (ParseUInt64(memory, &ret)) { | ||
return ret; | ||
} else { | ||
return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of stoul in dbwrapper.cpp" (df211cb)
Should probably add log print here similar to above:
LogPrint(BCLog::LEVELDB, "Failed to parse approximate-memory-usage property\n");
There is also some risk here that this change could make DynamicMemoryUsage
now return 0 in cases where it returned nonzero before (if property string has leading or trailing whitespace or other trailing characters), but logging should make it obvious if this is ever the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about assert(0)
here, but then reconsidered, because it's not important enough to crash the program. Will add a one-time log message.
for (const std::pair<const std::string, fs::path>& item : mapBlockFiles) { | ||
if (atoi(item.first) == nContigCounter) { | ||
uint32_t value; | ||
if (ParseUInt32(item.first, &value) && value == nContigCounter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi in CleanupBlockRevFiles" (f47450e)
Note: The behavior change here makes this code slightly more robust than before. Now if there are extra files like blkABC
they will just be deleted and not throw off the count of expected block files that are supposed to be kept. Throwing off the count would cause files we actually want to keep to be deleted.
@@ -222,10 +222,12 @@ bool RPCConsole::RPCParseCommandLine(interfaces::Node* node, std::string &strRes | |||
UniValue subelement; | |||
if (lastResult.isArray()) | |||
{ | |||
for(char argch: curarg) | |||
if (!IsDigit(argch)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace ad-hoc validity check and atoi in rpcconsole" (5cdaf4a)
Slight change of behavior here with looser parsing since this now accepts +
or -
signed values. But this seems fine and doesn't need additional documentation (though it could be mentioned in the commit message) since there is immediate feedback in case of an error.
if (vStrInputParts.size() > 2) | ||
nSequenceIn = std::stoul(vStrInputParts[2]); | ||
if (vStrInputParts.size() > 2) { | ||
if (!ParseUInt32(vStrInputParts[2], &nSequenceIn)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "util: Add error handling to sequence id in bitcoin-tx" (f7f2472)
Slight change of behavior here with stricter parsing. But this seems fine and doesn't need additional documentation (though it could be mentioned in the commit message) since there is immediate feedback in case of an error.
long count = strtol(path[0].c_str(), nullptr, 10); | ||
if (count < 1 || count > 2000) | ||
uint32_t count; | ||
if (!ParseUInt32(path[0], &count) || count < 1 || count > 2000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of strtol in rest.cpp" (a8f7502)
Slight change of behavior here with stricter parsing: trailing characters no longer allowed for example. But this seems fine and doesn't need additional documentation (though it could be mentioned in the commit message) since there is immediate feedback in case of an error.
if (w->empty()) | ||
{ | ||
// Empty string, ignore. (boost::split given '' will return one word) | ||
} | ||
else if (std::all_of(w->begin(), w->end(), ::IsDigit) || | ||
(w->front() == '-' && w->size() > 1 && std::all_of(w->begin()+1, w->end(), ::IsDigit))) | ||
else if (ParseInt64(*w, &n)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi64 in core_read" (e4aa1f7)
Parsing here seems a little looser than before, because ParseScript
will now accept signed numbers beginning with +
, while it previously only accepted numbers beginning with -
. I'm assuming this is reasonable behavior. ParseScript
appears to only actually be called in tests and by bitcoin-tx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ParseIntXX
accepts signed numbers beginning with +
? whoops, I don't think this was intentional. Thanks for catching this, I think that needs to be documented.
Agree it's not really problematic in this specific case, but will re-add the "starts with digit or -" check just in case.
nTransactionsUpdatedLastLP = atoi64(lpstr.substr(64)); | ||
uint32_t temp; | ||
if (!ParseUInt32(lpstr.substr(64), &temp)) { | ||
throw JSONRPCError(RPC_INVALID_PARAMETER, "could not parse lpval"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi64 in rpc/mining.cpp" (b7315cb)
Slight change of behavior here with stricter parsing. For example no whitespace or trailing characters allowed anymore. But this seems fine and doesn't need additional documentation (though it could be mentioned in the commit message) since there is immediate feedback in case of an error.
@@ -68,7 +68,10 @@ bool ParseMoney(const char* pszIn, CAmount& nRet) | |||
return false; | |||
if (nUnits < 0 || nUnits > COIN) | |||
return false; | |||
int64_t nWhole = atoi64(strWhole); | |||
int64_t nWhole; | |||
if (!ParseInt64(strWhole, &nWhole)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi64 in moneystr.cpp" (2fcdd16)
I might be missing something looking at this code, but it seems like a possible bug if empty string would have been parse successfully as 0
before but now it will fail to parse. Otherwise behavior seems unchanged since strWhole
can only contain digit characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we need a test for ParseMoney that exercises that!
@@ -200,12 +200,10 @@ typedef std::map<std::string, std::string> mapValue_t; | |||
|
|||
static inline void ReadOrderPos(int64_t& nOrderPos, mapValue_t& mapValue) | |||
{ | |||
if (!mapValue.count("n")) | |||
if (!mapValue.count("n") || !ParseInt64(mapValue["n"], &nOrderPos)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi64 in wallet ReadOrderPos" (87a3f13)
Note: Slightly stricter parsing is done here (no whitespace or trailing characters allowed) but this should be fine since string just comes from calling i64tostr
. Could potentially log or return an error though if parsing fails, since it would be an indication of corruption.
@@ -414,7 +414,12 @@ class CWalletTx | |||
} | |||
|
|||
ReadOrderPos(nOrderPos, mapValue); | |||
nTimeSmart = mapValue.count("timesmart") ? (unsigned int)atoi64(mapValue["timesmart"]) : 0; | |||
uint64_t temp; | |||
if (mapValue.count("timesmart") && ParseUInt64(mapValue["timesmart"], &temp)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "refactor: Replace use of atoi64 in timesmart unserialize" (f4218c2)
Note: Slightly stricter parsing is done here (no whitespace or trailing characters allowed) but this should be fine since string just comes from calling strprintf("%u")
. Could potentially log or return an error though if parsing fails, since it would be an indication of corruption.
Thanks for the review, have reopened and will address your comments. |
This was causing a lot of test cases not not be very meaningful because multiple configuration options were combined into one line. The changes in test output with this fix make sense and look like: ```diff - testnet=1 regtest=1 || test + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at most one. ``` Issue was reported and debugged by Wladimir J. van der Laan <laanwj@protonmail.com> in bitcoin#17385 (comment)
Sorry for messing around with this again, but I realized something. Before doing this it is important to have our own, controlled, implementation of Adding WIP tag (it's not possible to change a PR to draft, unfortunately). |
Agree this would make the change safer and easier to reason about. I don't think strictly speaking most of the commits actually need this to be correct, either because looser parsings won't cause harm or because current code is already using the locale anyway. But if we can remove locale from these places, it would be a good thing. |
Needs rebase |
This was causing a lot of test cases not not be very meaningful because multiple configuration options were combined into one line. The changes in test output with this fix make sense and look like: ```diff - testnet=1 regtest=1 || test + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at most one. ``` Issue was reported and debugged by Wladimir J. van der Laan <laanwj@protonmail.com> in bitcoin#17385 (comment)
Summary: > This was causing a lot of test cases not not be very meaningful because > multiple configuration options were combined into one line. > > The changes in test output with this fix make sense and look like: > > ```diff > - testnet=1 regtest=1 || test > + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at > > most one. > ``` > Issue was reported and debugged by > Wladimir J. van der Laan <laanwj@protonmail.com> in > bitcoin/bitcoin#17385 (comment) This is a backport of Core [[bitcoin/bitcoin#17388 | PR17388]] Test Plan: `ninja && ninja check` Reviewers: O1 Bitcoin ABC, #bitcoin_abc, majcosta Reviewed By: O1 Bitcoin ABC, #bitcoin_abc, majcosta Differential Revision: https://reviews.bitcoinabc.org/D8257
This was causing a lot of test cases not not be very meaningful because multiple configuration options were combined into one line. The changes in test output with this fix make sense and look like: ```diff - testnet=1 regtest=1 || test + testnet=1 regtest=1 || error: Invalid combination of -regtest, -testnet and -chain. Can use at most one. ``` Issue was reported and debugged by Wladimir J. van der Laan <laanwj@protonmail.com> in bitcoin#17385 (comment)
Addresses comment #17379 (comment)
This replaces direct use of (often locale dependent) C library functions with our own integer parsing and formatting functions., as the developer notes recommend.
For parsing this introduces parse error checking that was not there before. In many cases it's clear what to do on failure, but sometimes it's not. It also adds explicitly sized types everywhere. Please pay attention to this while reviewing.
See individual commits for details.
This leaves the following "known violations":
"src/bitcoin-tx.cpp.*trim_right"
boost function, out of scope"src/dbwrapper.cpp:.*vsnprintf"
copied function, the comment addresses this: https://github.com/bitcoin/bitcoin/blob/master/src/dbwrapper.cpp#L19"src/httprpc.cpp.*trim"
another boost function, out of scope"src/torcontrol.cpp:.*strtol"
octal parsing, out of scope"src/util/strencodings.cpp:.*strtol"
part ofParseInt32
andParseUInt32
, gated by our own checks"src/util/strencodings.cpp:.*strtoul"
part ofParseInt64
andParseUInt64