Skip to content

Conversation

practicalswift
Copy link
Contributor

@practicalswift practicalswift commented Apr 20, 2018

This linter will check for code accidentally introducing locale dependencies.

Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

Context: #12881 (comment)

Example output:

$ contrib/devtools/lint-locale-dependence.sh
The locale dependent function tolower(...) appears to be used:
src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

Unnecessary locale dependence can cause bugs that are very
tricky to isolate and fix. Please avoid using locale dependent
functions if possible.

Advice not applicable in this specific case? Add an exception
by updating the ignore list in contrib/devtools/lint-locale-dependence.sh

Note to reviewers: What is the most appropriate LOCALE_DEPENDENT_FUNCTIONS function list? What should be added or removed?

@practicalswift
Copy link
Contributor Author

@laanwj Do you think the LOCALE_DEPENDENT_FUNCTIONS list is correct? :-)

@laanwj
Copy link
Member

laanwj commented Apr 23, 2018

Concept ACK. I think this is a good idea.

@laanwj Do you think the LOCALE_DEPENDENT_FUNCTIONS list is correct? :-)

It's a good start.
Please add strftime too, that's the one that caused recent discussion in #12973.

@laanwj laanwj self-assigned this Apr 23, 2018
@practicalswift practicalswift force-pushed the lint-locale-dependence branch from 8a19194 to 4ba8cc1 Compare April 23, 2018 14:02
@practicalswift
Copy link
Contributor Author

@laanwj strftime and strptime added. Please re-review :-)

toupper towlower towupper
)
REGEXP_IGNORE_EXTERNAL="src/(crypto/ctaes/|leveldb/|secp256k1/|tinyformat.h|univalue/)"
REGEXP_IGNORE_KNOWN_VIOLATIONS="^(src/base58.cpp:.*isspace|src/qt/rpcconsole.cpp:.*isdigit|src/rest.cpp:.*strtol|src/torcontrol.cpp:.*strtol|src/uint256.cpp:.*isspace|src/uint256.cpp:.*tolower|src/util.cpp:.*tolower|src/utilmoneystr.cpp:.*isdigit|src/utilmoneystr.cpp:.*isspace|src/utilstrencodings.cpp:.*isspace|src/utilstrencodings.cpp:.*strtoll|src/utilstrencodings.cpp:.*strtol)"
Copy link
Member

@laanwj laanwj Apr 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a shocking number of known violations.
I didn't know e.g. strtol is also locale dependent, otherwise I wouldn't have used it in torcontrol for example.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to split this list over multiple lines, so that it's possible to remove an exception by removing a line?

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from 4ba8cc1 to d7c3267 Compare April 23, 2018 15:03
@practicalswift
Copy link
Contributor Author

@laanwj Updated version with the known violations list split over multiple lines. Please re-review :-)

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from d7c3267 to 0b4965b Compare April 23, 2018 15:13
@practicalswift
Copy link
Contributor Author

Candidates for inclusion in the list of locale dependent functions:

$ nm /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so | grep -E '_l$' | \
      cut -f3 -d' ' | grep -E '^[a-z]' | sort | sed 's/_l$//g'
isalnum
isalpha
isblank
iscntrl
isdigit
isgraph
islower
isprint
ispunct
isspace
isupper
iswalnum
iswalpha
iswblank
iswcntrl
iswctype
iswdigit
iswgraph
iswlower
iswprint
iswpunct
iswspace
iswupper
iswxdigit
isxdigit
nl_langinfo
strcasecmp
strcoll
strerror
strfmon
strftime
strncasecmp
strptime
strtod
strtof
strtold
strtol
strtoll
strtoul
strtoull
strxfrm
tolower
toupper
towctrans
towlower
towupper
wcscasecmp
wcscoll
wcsftime
wcsncasecmp
wcstod
wcstof
wcstold
wcstol
wcstoll
wcstoul
wcstoull
wcsxfrm
wctrans
wctype

@practicalswift
Copy link
Contributor Author

The use of strtoul(...) and strtoull(...) might be problematic too?

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from 0b4965b to 29b73b5 Compare April 23, 2018 15:39
@practicalswift
Copy link
Contributor Author

Updated: Added more functions to LOCALE_DEPENDENT_FUNCTIONS. Added two known violations for strtoul(...) and strtoull(...) in utilstrencodings.cpp.

Please re-review :-)

@laanwj
Copy link
Member

laanwj commented Apr 24, 2018

utACK 29b73b509f1aa537a835f9e75878622c3a18335f

@practicalswift
Copy link
Contributor Author

practicalswift commented Apr 24, 2018

@theuni @sipa Your language expertise is needed :-)

If you have time to review this PR – does LOCALE_DEPENDENT_FUNCTIONS look reasonable? :-)

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from 29b73b5 to b2af0ad Compare April 24, 2018 12:24
@practicalswift
Copy link
Contributor Author

practicalswift commented Apr 24, 2018

Updated:

  • Added more functions to LOCALE_DEPENDENT_FUNCTIONS.
  • Added known atoi(…) violations for bitcoin-tx.cpp, init.cpp, qt/rpcconsole.cpp, torcontrol.cpp, util.cpp, utilstrencodings.cpp and utilstrencodings.h.
  • Added known *printf(…) violations for dbwrapper.cpp, test/dbwrapper_tests.cpp and util.cpp.
  • Now invoking git grep only once in order to minimize run-time.

Please re-review :-)

@practicalswift practicalswift force-pushed the lint-locale-dependence branch 6 times, most recently from 1a99d9a to ddb1719 Compare April 24, 2018 16:22
@laanwj
Copy link
Member

laanwj commented Apr 25, 2018

I think printf and fprintf(stderr, ... are fine. They are only used to print error messages, it's not important if they are formatted slightly different according to the locale.
(not so much for sprintf vsnprintf etc which should be replaced with use of tinyformat, these end up in strings and might end up on the RPC interface or in files meant to be portable)

@practicalswift
Copy link
Contributor Author

@laanwj You saw that fprintf(stderr, … is explicitly filtered out via grep -vE 'fprintf\(.*(stdout|stderr)' in the script? :-)

Should I keep fprintf (with the exclusion above) and remove printf? Makes sense?

@laanwj
Copy link
Member

laanwj commented Apr 25, 2018

@laanwj You saw that fprintf(stderr, … is explicitly filtered out via grep -vE 'fprintf(.*(stdout|stderr)' in the script? :-)

Oh that's good! No, hadn't noticed that yet.

Should I keep fprintf (with the exclusion above) and remove printf? Makes sense?

Yes I think so!

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from ddb1719 to 6271495 Compare April 25, 2018 13:01
@practicalswift
Copy link
Contributor Author

@laanwj Updated version with printf removed! Please re-review :-)

@laanwj
Copy link
Member

laanwj commented Apr 26, 2018

Thinking of it, it would make sense to add this list to the developer notes as well (next to the mention of the locale risks) , so it doesn't come out of the blue.

echo
EXIT_CODE=1
fi
done
Copy link
Contributor

@Empact Empact May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to simplify this by processing them at once? I'm skeptical grouping by function will be helpful.

Copy link
Contributor Author

@practicalswift practicalswift May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Empact For performance or usability reasons?

If you mean for performance reasons, then please note that the costly operation git grep is run only once.

From a usability perspective I think it makes sense to group the output per function – try running the script with an empty KNOWN_VIOLATIONS and you'll see what I mean :-) But I'm open to alternative outputs, so please provide a diff and I'll check out your suggestion :-)

Copy link
Contributor

@Empact Empact May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Speaking from a maintainability perspective - seems in the general case KNOWN_VIOLATIONS will be populated so the list will be relatively short and only introduced by they who open the PR, in which case the grouping isn't very beneficial IMO. If not much usability benefit is gained, then if the code can be simplified by removing it I think it's worthwhile to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Empact Do you have any specific output formatting in mind? Please give an example on how it would look :-)

@laanwj
Copy link
Member

laanwj commented Jun 5, 2018

utACK f947ef02f9c8eee5957bb02e12cee4edd636c9fa

@cnavigato
Copy link

cnavigato commented Jun 5, 2018 via email

@ken2812221
Copy link
Contributor

ken2812221 commented Jun 5, 2018

Need to move contrib/devtools/lint-locale-dependence.sh to test/lint/lint-locale-dependence.sh

@practicalswift practicalswift force-pushed the lint-locale-dependence branch from f947ef0 to 698cfd0 Compare June 6, 2018 06:08
@practicalswift
Copy link
Contributor Author

@ken2812221 Thanks! Moved!

@laanwj @cnavigato Please re-review :-)

Copy link
Contributor

@ken2812221 ken2812221 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 698cfd0

@laanwj laanwj merged commit 698cfd0 into bitcoin:master Jun 7, 2018
laanwj added a commit that referenced this pull request Jun 7, 2018
…of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: #12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
maflcko pushed a commit to maflcko/bitcoin-core that referenced this pull request Sep 13, 2018
14b29a7 Fix reference to lint-locale-dependence.sh (Hennadii Stepanov)

Pull request description:

  The wrong reference sneaked through bitcoin#13041 and bitcoin#13281.

Tree-SHA512: 39cc74297d00141206ce68b84575288a20e39e20ef717fa8b8c09076b5fb46d8281320b144a596094365f2a0704e5dd6bf960e35980ae4730546a72957403d69
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 17, 2020
…uction of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: bitcoin#12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 2, 2020
…uction of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: bitcoin#12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
maflcko pushed a commit that referenced this pull request Jul 2, 2020
…se of uninitialized memory

870f0cd build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory (practicalswift)

Pull request description:

  Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory.

  First UBSan, then ASan followed by TSan... and now: yes, the wait is over -- **MSan is finally here!** :)

  Some historical context:
  * 2017: Continuous compilation with Clang Thread Safety analysis enabled (#10866, #10923)
  * 2018: Continuous testing with trapping on signed integer overflows (`-ftrapv`) (#12686)
  * 2018: Continuous testing of use of locale dependent functions (#13041)
  * 2018: Continuous testing of format strings (#13705)
  * 2018: Continuous compilation with MSVC `TreatWarningAsError` (#14151)
  * 2018: Continuous testing under UndefinedBehaviorSanitizer – UBSan (#14252, #14673, #17006)
  * 2018: Continuous testing under AddressSanitizer – ASan (#14794, #17205, #17674)
  * 2018: Continuous testing under ThreadSanitizer – TSan (#14829)
  * 2019: Continuous testing in an unsigned char environment (`-funsigned-char`) (#15134)
  * 2019: Continuous compile-time testing of assumptions we're making (#15391)
  * 2019: Continuous testing of fuzz test cases under Valgrind (#17633, #18159, #18166)
  * 2020: Finally... MemorySanitizer – MSAN! :)

  What is the next step? What tools should we add to CI to keep bugs from entering `master`? :)

ACKs for top commit:
  MarcoFalke:
    ACK 870f0cd

Tree-SHA512: 38327c8b75679d97d469fe42e704cacd1217447a5a603701dd8a58ee50b3be2c10248f8d68a479ed081c0c4b254589d3081c9183f991640b06ef689061f75578
@practicalswift practicalswift deleted the lint-locale-dependence branch April 10, 2021 19:34
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 27, 2021
14b29a7 Fix reference to lint-locale-dependence.sh (Hennadii Stepanov)

Pull request description:

  The wrong reference sneaked through bitcoin#13041 and bitcoin#13281.

Tree-SHA512: 39cc74297d00141206ce68b84575288a20e39e20ef717fa8b8c09076b5fb46d8281320b144a596094365f2a0704e5dd6bf960e35980ae4730546a72957403d69
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 28, 2021
14b29a7 Fix reference to lint-locale-dependence.sh (Hennadii Stepanov)

Pull request description:

  The wrong reference sneaked through bitcoin#13041 and bitcoin#13281.

Tree-SHA512: 39cc74297d00141206ce68b84575288a20e39e20ef717fa8b8c09076b5fb46d8281320b144a596094365f2a0704e5dd6bf960e35980ae4730546a72957403d69
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jun 29, 2021
14b29a7 Fix reference to lint-locale-dependence.sh (Hennadii Stepanov)

Pull request description:

  The wrong reference sneaked through bitcoin#13041 and bitcoin#13281.

Tree-SHA512: 39cc74297d00141206ce68b84575288a20e39e20ef717fa8b8c09076b5fb46d8281320b144a596094365f2a0704e5dd6bf960e35980ae4730546a72957403d69
PastaPastaPasta pushed a commit to PastaPastaPasta/dash that referenced this pull request Jul 1, 2021
14b29a7 Fix reference to lint-locale-dependence.sh (Hennadii Stepanov)

Pull request description:

  The wrong reference sneaked through bitcoin#13041 and bitcoin#13281.

Tree-SHA512: 39cc74297d00141206ce68b84575288a20e39e20ef717fa8b8c09076b5fb46d8281320b144a596094365f2a0704e5dd6bf960e35980ae4730546a72957403d69
vijaydasmp pushed a commit to vijaydasmp/dash that referenced this pull request Oct 4, 2021
…etect use of uninitialized memory

870f0cd build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory (practicalswift)

Pull request description:

  Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory.

  First UBSan, then ASan followed by TSan... and now: yes, the wait is over -- **MSan is finally here!** :)

  Some historical context:
  * 2017: Continuous compilation with Clang Thread Safety analysis enabled (bitcoin#10866, bitcoin#10923)
  * 2018: Continuous testing with trapping on signed integer overflows (`-ftrapv`) (bitcoin#12686)
  * 2018: Continuous testing of use of locale dependent functions (bitcoin#13041)
  * 2018: Continuous testing of format strings (bitcoin#13705)
  * 2018: Continuous compilation with MSVC `TreatWarningAsError` (bitcoin#14151)
  * 2018: Continuous testing under UndefinedBehaviorSanitizer – UBSan (bitcoin#14252, bitcoin#14673, bitcoin#17006)
  * 2018: Continuous testing under AddressSanitizer – ASan (bitcoin#14794, bitcoin#17205, bitcoin#17674)
  * 2018: Continuous testing under ThreadSanitizer – TSan (bitcoin#14829)
  * 2019: Continuous testing in an unsigned char environment (`-funsigned-char`) (bitcoin#15134)
  * 2019: Continuous compile-time testing of assumptions we're making (bitcoin#15391)
  * 2019: Continuous testing of fuzz test cases under Valgrind (bitcoin#17633, bitcoin#18159, bitcoin#18166)
  * 2020: Finally... MemorySanitizer – MSAN! :)

  What is the next step? What tools should we add to CI to keep bugs from entering `master`? :)

ACKs for top commit:
  MarcoFalke:
    ACK 870f0cd

Tree-SHA512: 38327c8b75679d97d469fe42e704cacd1217447a5a603701dd8a58ee50b3be2c10248f8d68a479ed081c0c4b254589d3081c9183f991640b06ef689061f75578
gades pushed a commit to cosanta/cosanta-core that referenced this pull request Mar 8, 2022
…uction of locale dependence

698cfd0 docs: Mention lint-locale-dependence.sh in developer-notes.md (practicalswift)
0a4ea2f build: Add linter for checking accidental locale dependence (practicalswift)

Pull request description:

  This linter will check for code accidentally introducing locale dependencies.

  Unnecessary locale dependence can cause bugs that are very tricky to isolate and fix. We should avoid using locale dependent functions if possible.

  Context: bitcoin#12881 (comment)

  Example output:

  ```
  $ contrib/devtools/lint-locale-dependence.sh
  The locale dependent function tolower(...) appears to be used:
  src/init.cpp:    if (s[0] == '0' && std::tolower(s[1]) == 'x') {

  Unnecessary locale dependence can cause bugs that are very
  tricky to isolate and fix. Please avoid using locale dependent
  functions if possible.

  Advice not applicable in this specific case? Add an exception
  by updating the ignore list in contrib/devtools/lint-locale-dependence.sh
  ```

  **Note to reviewers:** What is the most appropriate `LOCALE_DEPENDENT_FUNCTIONS` function list? What should be added or removed?

Tree-SHA512: 14e448828804bb02bf59070647e38b52fce120c700c903a4a8472769a2cee5dd529bd3fc182386993cb8720482cf4250b63a0a477db61b941ae4babe5c65025f
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants