build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory #18288

practicalswift · 2020-03-07T12:42:17Z

Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory.

First UBSan, then ASan followed by TSan... and now: yes, the wait is over -- MSan is finally here! :)

Some historical context:

2017: Continuous compilation with Clang Thread Safety analysis enabled (Fix -Wthread-safety-analysis warnings. Compile with -Wthread-safety-analysis if available. #10866, travis: Build with --enable-werror under OS X #10923)
2018: Continuous testing with trapping on signed integer overflows (-ftrapv) (Add -ftrapv to CFLAGS and CXXFLAGS when --enable-debug is used. Enable -ftrapv in Travis. #12686)
2018: Continuous testing of use of locale dependent functions (build: Add linter checking for accidental introduction of locale dependence #13041)
2018: Continuous testing of format strings (build: Add format string linter #13705)
2018: Continuous compilation with MSVC TreatWarningAsError (windows: Fix remaining compiler warnings (MSVC) #14151)
2018: Continuous testing under UndefinedBehaviorSanitizer – UBSan (build: Run functional tests and benchmarks under the undefined behaviour sanitizer (UBSan) #14252, travis: Fail the UBSan Travis build in case of newly introduced UBSan errors #14673, tests: Enable UBSan for Travis fuzzing job #17006)
2018: Continuous testing under AddressSanitizer – ASan (tests: Add AddressSanitizer (ASan) Travis build #14794, ci: Enable address sanitizer (ASan) stack-use-after-return checking #17205, tests: Add initialization order fiasco detection in Travis #17674)
2018: Continuous testing under ThreadSanitizer – TSan (travis: Enable functional tests in the ThreadSanitizer (TSan) build job #14829)
2019: Continuous testing in an unsigned char environment (-funsigned-char) (tests: Switch one of the Travis jobs to an unsigned char environment (-funsigned-char) #15134)
2019: Continuous compile-time testing of assumptions we're making (Add compile time verification of assumptions we're currently making implicitly/tacitly #15391)
2019: Continuous testing of fuzz test cases under Valgrind (tests: Add option --valgrind to run the functional tests under Valgrind #17633, tests: Add --valgrind option to test/fuzz/test_runner.py for running fuzzing test cases under valgrind #18159, ci: Run fuzz testing test cases (bitcoin-core/qa-assets) under valgrind to catch memory errors #18166)
2020: Finally... MemorySanitizer – MSAN! :)

What is the next step? What tools should we add to CI to keep bugs from entering master? :)

maflcko · 2020-03-07T21:20:59Z

We already run against valgrind and msan is a subset of valgrind according to: "MSan implements a subset of functionality found in Valgrind (Memcheck tool). It is significantly faster than Memcheck (TODO:benchmark)." (https://github.com/google/sanitizers/wiki/MemorySanitizer)

practicalswift · 2020-03-07T23:14:33Z

@MarcoFalke

I think we should use both. While both tools try to find uses of uninitialized memory (UUM) they are implemented in totally different ways: Valgrind's memcheck is using binary translation while MSan is using compiler-based instrumentation. ~~From my experience it is not guaranteed that Valgrind finds all issues reported by MSan and vice versa.~~

Furthermore, having the combination libFuzzer+MSan readily available is really crucial when fuzzing. The combination libFuzzer+Valgrind is only usable for testing against an existing fuzzing corpus like we do in Travis -- not for doing any actual fuzzing. For the libFuzzer+MSan combination to be available it is important to have the cumbersome MSan setup process documented and also kept up-to-date and working over time. We get that for free by having an MSan build in Travis.

Also MSan is order of magnitudes faster than Valgrind which means that we can run also tests that are too slow for Valgrind in Travis.

And last but not least MSan can report the name of the variable that has not been initialized (thanks to MSan using compiler-based instrumentation). Valgrind only points to the function name.

In summary: I think having MSan in Travis is a very clear win. Let's use all available tools to try to kill the uninitialized memory (UUM) bug class :)

Let me know if you can think of any problems that could arise from testing also under MSan in Travis.

practicalswift · 2020-03-07T23:23:02Z

@MarcoFalke

We already run against valgrind […]

Correct if you mean the unit tests.

Incorrect if you mean the functional tests :)

Due to Valgrind's slowness we currently only run a single(!) functional test under Valgrind (TEST_RUNNER_EXTRA="p2p_segwit.py").

Note that this PR runs all available functional tests (and unit tests) under MSan.

gmaxwell · 2020-03-08T01:38:36Z

I'd love to see a bug that msan catches that valgrind doesn't.

practicalswift · 2020-03-08T08:24:41Z

From my experience it is not guaranteed that Valgrind finds all issues reported by MSan and vice versa.

I'd love to see a bug that msan catches that valgrind doesn't.

Sorry about making a vague claim that I cannot back up with an example. Until I can provide an example we can simplify the discussion by assuming that I was flat out wrong about the possibility of Valgrind not catching an issue found by MSan (given enough runtime to compensate for being orders of magnitude slower than MSan).

From a practical perspective the main reason to start using MSan is the speedup that will allow us to go from detecting use of uninitialized memory in code covered by a single(!) functional test (p2p_segwit.py) to detecting use of uninitialized memory in all code covered by our functional tests.

Please note that I'm not suggesting removing the Valgrind job: I think we should use both and ideally bump the timeouts so that Valgrind can test a couple of more functional tests at least :)

(Just to be clear: I love both Valgrind (proof of love: #18166, #18162, #18159, #17633, #17640, #17624, #17455, #15117, #11035, #11024, #10977) and MSan – I use them daily :))

Let me know if you can think of any problems that could arise from testing also under MSan in Travis.

gmaxwell · 2020-03-08T09:39:12Z

That's fine, if you ever find any-- I'd be interested in seeing them whenever that is. The question was primarily driven out of my own curiosity.

Faster can be pretty useful, though I've found msan in the past to be somewhat lacking compared to valgrind in terms of sensitivity -- though my experience is mostly from when msan was very new. I've found msan most useful in the context of fuzzing, where any speed difference makes for a meaningful throughput difference which ultimately results in a sensitivity difference.

The fact that even paid travis is so cpu starved that this is a big issue for just CI tests is something of is own problem-- labour is expensive, insight is rare. CPU time is the one resource needed by the project which should never be the limiting ingredient, this matters because a surplus of CPU time can be a force multiplier for human labour (though my idea of 'reasonable amounts of computing power' starts at about a thousand core ghz per person, and I've always found it really weird how software people who are at the top of their field often work from just a single underpowered laptop; so maybe I'm just a weirdo). FWIW, for a number of years I tested every release of bitcoin in valgrind, including doing IBDs (overnight).

I don't see any problem with making more use of msan, though it's not a complete replacement for valgrind. But more is more. :)

practicalswift · 2020-03-10T13:27:36Z

@MarcoFalke

Would you mind reviewing? :)

I believe that having this merged would have allowed us to automatically detect at least two remotely triggerable uses of uninitialized memory (UUM) in our P2P networking code:

2020: Use of uninitialized memory in Erlay networking code - found pre-merge
2020: Use of uninitialized memory in BIP324 encrypted p2p transport de-/serializer code (truth in advertising: I haven't verified this one by writing a PoC) - found pre-merge
2019: Use of uninitialized memory in networking code when receiving a transaction we already have - found post-merge

FWIW - a subset of non-networking uses of uninitialized memory:

maflcko

Concept ACK. I had issues getting valgrind run on non-x86 arches, so this might help in that regard. Not sure though if we should also run this on travis, given that travis hopefully will run valgrind on everything: #18304

ci/test/00_setup_env_native_nowallet_msan.sh

.travis.yml

depends/packages/boost.mk

practicalswift · 2020-03-10T15:38:12Z

Not sure though if we should also run this on travis, given that travis hopefully will run valgrind on everything: #18304

Even if we assume that we get Valgrind to run all the unit and functional tests within the Travis time limit (which I unfortunately doubt is technically possible) I still don't see why we would not like to use both Valgrind and MSan in Travis? :)

As @gmaxwell put it: "more is more" :)

MSan is literally orders of magnitude faster than Valgrind and also gives better debugging info in case of presence of UUM.

Let me know if you can think of any problems that could arise from testing also under MSan in Travis.

4444edc ci: Enable all functional tests in valgrind (MarcoFalke) Pull request description: The travis timeout for our repo has been bumped to 2h, so we can run all tests in valgrind now ACKs for top commit: practicalswift: ACK 4444edc -- regarding the three disabled cases (`feature_abortnode`, `feature_block` and `rpc_bind`): not a big deal since MSan will take care of those once #18288 is merged. More is more :) Tree-SHA512: ea2f798112911b6d1f3d88cfcdf0a7cdb698687248343703d6fe55da144542c961c15d888bffb41672c10aa76765615cb7c7ff93d468bfad3c51f962f24e7abb

Sjors · 2020-03-12T13:17:48Z

Concept ACK. cd120cf looks sane, doesn't hurt my macOS build, but otherwise a bit above my pay-grade. The changes are contained to their own machine, except:

boost depends tweak that's now moved to depends: Make it possible to build Boost dependency using a clang toolset (./b2 toolset=clang) #18308
MSAN handling in GetOSRand, which is a more surgical alternative to Suppress false positive warning about uninitialized entropy buffers #17627

For the libFuzzer+MSan combination to be available it is important to have the cumbersome MSan setup process documented and also kept up-to-date and working over time. We get that for free by having an MSan build in Travis.

Keeping this up to date, by having it run on Travis, seems useful. We can always triage Travis resources if funding runs dry.

The build took 1 hours and 15 minutes the first time. You could push a trivial commit to see much of that is shaved off by the cache.

Slow Travis builds can be a burden on review, because Github won't show a build as failed until all Travis machines of the current phase are done. I often find myself reviewing PRs in EU morning with trivial bugs not spotted by US devs before they went to be.

This can be mitigated if you add a new stage grind(?) and move the machine there. (I also agree with @gmaxwell about developers not using enough tooling, guilty as charged)

It could also run as part of a cron job on master ("$TRAVIS_EVENT_TYPE" = "cron"), but I'm not sure who gets notified when those fail. And it's better to catch stuff before merge, so let's not do that unless we run short on resources.

practicalswift · 2020-03-12T22:57:05Z

The build took 1 hours and 15 minutes the first time.

That seems to be roughly the time it takes also with caching. That might sound long but it is still 45 minutes quicker than the current valgrind job which only covers a subset of the functional tests whereas this one covers all of them :)

(To be clear: even with the two hour valgrind job I think it easily is worth taking that cost in terms of Travis build time to protect our users from at least a subset of UUM bugs. This kind of (partial) protection is long-overdue :))

jonatack · 2020-06-15T09:32:15Z

Concept ACK

fjahr

Concept ACK

Code looks good but I don't have extensive experience with the CI scripts so far.

contrib/install_db4.sh

practicalswift · 2020-06-17T05:49:26Z

@fanquake

@practicalswift can you drop your boost changes here, and rebase on #18820? We'll get the macOS issues sorted out.

That has been done :) Ready for final review? :)

fanquake

Left a few comments. I also wrapped up these changes into a msan dockerfile for my own use. Will check that we can use a special case list rather than patching random.cpp

ci/test/00_setup_env_native_msan.sh

src/random.cpp

ci/test/00_setup_env_native_msan.sh

practicalswift · 2020-06-23T07:37:48Z

@MarcoFalke @fanquake Thanks a lot for great and detailed feedback!

All feedback should now be addressed:

Dropped --disable-asm.
Now building with ZeroMQ (dropped NO_ZMQ=1 and --disable-zmq).
Bumped to clang-9 and llvm-9.
No longer patching src/random.cpp with MemorySanitizer annotations.
Removed unneccessary bash export of LIBCXX_DIR.

The current version of this PR only touches only .travis.yml and ci/test/.

Should be ready for final review :)

…lized memory

maflcko · 2020-06-24T22:07:20Z

ACK 870f0cd

maflcko

nit (for a follow-up)

maflcko · 2020-06-27T14:09:20Z

ci/test/00_setup_env_native_msan.sh

+export BDB_PREFIX="${BASE_ROOT_DIR}/db4"
+
+export CONTAINER_NAME="ci_native_msan"
+export PACKAGES="clang-9 llvm-9 cmake"


Suggested change

export PACKAGES="clang-9 llvm-9 cmake"

export PACKAGES="clang-9 llvm-9 cmake" # When changing the clang version, also adjust ci/test/04_install.sh

…etect use of uninitialized memory 870f0cd build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory (practicalswift) Pull request description: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory. First UBSan, then ASan followed by TSan... and now: yes, the wait is over -- **MSan is finally here!** :) Some historical context: * 2017: Continuous compilation with Clang Thread Safety analysis enabled (bitcoin#10866, bitcoin#10923) * 2018: Continuous testing with trapping on signed integer overflows (`-ftrapv`) (bitcoin#12686) * 2018: Continuous testing of use of locale dependent functions (bitcoin#13041) * 2018: Continuous testing of format strings (bitcoin#13705) * 2018: Continuous compilation with MSVC `TreatWarningAsError` (bitcoin#14151) * 2018: Continuous testing under UndefinedBehaviorSanitizer – UBSan (bitcoin#14252, bitcoin#14673, bitcoin#17006) * 2018: Continuous testing under AddressSanitizer – ASan (bitcoin#14794, bitcoin#17205, bitcoin#17674) * 2018: Continuous testing under ThreadSanitizer – TSan (bitcoin#14829) * 2019: Continuous testing in an unsigned char environment (`-funsigned-char`) (bitcoin#15134) * 2019: Continuous compile-time testing of assumptions we're making (bitcoin#15391) * 2019: Continuous testing of fuzz test cases under Valgrind (bitcoin#17633, bitcoin#18159, bitcoin#18166) * 2020: Finally... MemorySanitizer – MSAN! :) What is the next step? What tools should we add to CI to keep bugs from entering `master`? :) ACKs for top commit: MarcoFalke: ACK 870f0cd Tree-SHA512: 38327c8b75679d97d469fe42e704cacd1217447a5a603701dd8a58ee50b3be2c10248f8d68a479ed081c0c4b254589d3081c9183f991640b06ef689061f75578

…MSAN job 3566353 ci: remove compiled-but-unused BDB from MSAN job (fanquake) Pull request description: Self-compiled BDB was added to this job as opposed to using depends BDB [due to linking issues](bitcoin/bitcoin#18288 (comment)), however the compiled BDB is not actually used. Remove it for now, given we don't actually lose any coverage (note that BDB is also not currently used in the naitve MSAN fuzz job or for [OSS Fuzz](https://github.com/google/oss-fuzz/blob/master/projects/bitcoin-core/build.sh#L32) builds). In future, we can use depends BDB, however introducing it now will cause false positives, which can be fixed by upgrading the versions of Clang / LLVM we use, however upgrading to those newer versions causes other issues, which appear in standard library code, and require more involved suppressing, which can be solved in a follow up or another PR i.e #23008. Top commit has no ACKs. Tree-SHA512: 9e8fdd95246cafa27cda7bcf0641b428d4573f6748ecdf07cc6205a64351db22ba383ec943e88a69df3694ccb9f125d994b64345a4e44fb6fea4a014504760d1

fanquake added Build system Tests labels Mar 7, 2020

practicalswift force-pushed the msan-in-travis branch 3 times, most recently from 356a2d1 to 9bcded4 Compare March 7, 2020 21:08

practicalswift force-pushed the msan-in-travis branch from 9bcded4 to da9e57f Compare March 7, 2020 22:46

practicalswift force-pushed the msan-in-travis branch from da9e57f to a4a25a0 Compare March 7, 2020 23:17

practicalswift mentioned this pull request Mar 9, 2020

Erlay: bandwidth-efficient transaction relay protocol #18261

Closed

maflcko reviewed Mar 10, 2020

View reviewed changes

ci/test/00_setup_env_native_nowallet_msan.sh Outdated Show resolved Hide resolved

ci/test/00_setup_env_native_nowallet_msan.sh Outdated Show resolved Hide resolved

.travis.yml Outdated Show resolved Hide resolved

depends/packages/boost.mk Outdated Show resolved Hide resolved

practicalswift force-pushed the msan-in-travis branch 2 times, most recently from 5dc1c78 to cd120cf Compare March 10, 2020 15:19

practicalswift mentioned this pull request Mar 10, 2020

depends: Make it possible to build Boost dependency using a clang toolset (./b2 toolset=clang) #18308

Closed

This was referenced Mar 10, 2020

"make depends" builds boost with g++ regardless of environment variable settings #15914

Closed

ci: Enable all functional tests in valgrind #18304

Merged

This was referenced Mar 12, 2020

Document how to run bitcoin unittests + functional tests with sanitizers #17834

Closed

MemorySanitizer complains about unitialized variable in RPC method names #17620

Closed

Sjors mentioned this pull request Mar 12, 2020

Suppress false positive warning about uninitialized entropy buffers #17627

Closed

practicalswift force-pushed the msan-in-travis branch from fa4ab54 to cd120cf Compare March 12, 2020 22:57

practicalswift mentioned this pull request Mar 13, 2020

test: Bump timeouts to avoid valgrind failures #18345

Merged

practicalswift force-pushed the msan-in-travis branch from 501a971 to ae5e947 Compare June 1, 2020 17:39

DrahtBot mentioned this pull request Jun 6, 2020

ci: Run ci configs on cirrus #19179

Merged

practicalswift mentioned this pull request Jun 10, 2020

Use of uninitialized memory when validating successfully deserialized CPubKey #19235

Closed

This was referenced Jun 13, 2020

build: macOS toolchain simplification and bump #19240

Merged

depends: Allow building with system clang #17919

Merged

fjahr reviewed Jun 16, 2020

View reviewed changes

contrib/install_db4.sh Outdated Show resolved Hide resolved

fanquake reviewed Jun 17, 2020

View reviewed changes

ci/test/00_setup_env_native_msan.sh Outdated Show resolved Hide resolved

ci/test/00_setup_env_native_msan.sh Outdated Show resolved Hide resolved

ci/test/00_setup_env_native_msan.sh Outdated Show resolved Hide resolved

src/random.cpp Outdated Show resolved Hide resolved

maflcko reviewed Jun 17, 2020

View reviewed changes

ci/test/00_setup_env_native_msan.sh Outdated Show resolved Hide resolved

practicalswift force-pushed the msan-in-travis branch 2 times, most recently from 801f7ad to 5e3927d Compare June 18, 2020 08:44

DrahtBot mentioned this pull request Jun 19, 2020

ci: Run asan ci config on cirrus #19321

Merged

DrahtBot added the Needs rebase label Jun 19, 2020

practicalswift force-pushed the msan-in-travis branch from 4756859 to dff0ad2 Compare June 21, 2020 21:39

DrahtBot removed the Needs rebase label Jun 21, 2020

practicalswift force-pushed the msan-in-travis branch from ccc42e3 to 7ab25b4 Compare June 23, 2020 06:09

build: Add MemorySanitizer (MSan) in Travis to detect use of uninitia…

870f0cd

…lized memory

practicalswift force-pushed the msan-in-travis branch from 7ab25b4 to 870f0cd Compare June 23, 2020 09:10

DrahtBot mentioned this pull request Jul 1, 2020

ci: Run tsan ci config on cirrus #19424

Merged

maflcko reviewed Jul 2, 2020

View reviewed changes

maflcko merged commit 7027c67 into bitcoin:master Jul 2, 2020

practicalswift mentioned this pull request Oct 1, 2020

signet: Fix uninitialized read in validation #20035

Merged

practicalswift deleted the msan-in-travis branch April 10, 2021 19:42

fanquake mentioned this pull request Mar 10, 2022

ci: remove compiled-but-unused BDB from MSAN job #24522

Merged

bitcoin locked as resolved and limited conversation to collaborators Aug 16, 2022

	export PACKAGES="clang-9 llvm-9 cmake"
	export PACKAGES="clang-9 llvm-9 cmake" # When changing the clang version, also adjust ci/test/04_install.sh

build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory #18288

build: Add MemorySanitizer (MSan) in Travis to detect use of uninitialized memory #18288

Uh oh!

Conversation

practicalswift commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maflcko commented Mar 7, 2020

Uh oh!

practicalswift commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Mar 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmaxwell commented Mar 8, 2020

Uh oh!

practicalswift commented Mar 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gmaxwell commented Mar 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

practicalswift commented Mar 10, 2020

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

practicalswift commented Mar 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sjors commented Mar 12, 2020

Uh oh!

practicalswift commented Mar 12, 2020

Uh oh!

jonatack commented Jun 15, 2020

Uh oh!

fjahr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

practicalswift commented Jun 17, 2020

Uh oh!

fanquake left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

practicalswift commented Jun 23, 2020

Uh oh!

maflcko commented Jun 24, 2020

Uh oh!

maflcko left a comment

Choose a reason for hiding this comment

Uh oh!

maflcko Jun 27, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

practicalswift commented Mar 7, 2020 •

edited

Loading

practicalswift commented Mar 7, 2020 •

edited

Loading

practicalswift commented Mar 7, 2020 •

edited

Loading

practicalswift commented Mar 8, 2020 •

edited

Loading

gmaxwell commented Mar 8, 2020 •

edited

Loading

practicalswift commented Mar 10, 2020 •

edited

Loading