-
Notifications
You must be signed in to change notification settings - Fork 37.8k
ci: detect outbound internet traffic generated while running tests #31349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/31349. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
54a6482
to
3ec89f4
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Nice. Conecpt ACK! |
Concept ACK i'm slightly worried this may generate false positive. As is, this detects traffic on the entire (virtual) machine while running the tests. Are there no other daemons running on the CI instance that could interfere with this? |
3ec89f4
to
3c4b203
Compare
@laanwj, Right! And Another source of false positive could be if somebody from the outside initiates communication to the VM to which it responds. E.g. an outsider tries to connect to the VM to which it responds with an outbound packet e.g. TCP RST. At least that should be obvious from the error log, showing the incoming packet first (I just pushed a slight change for that). Maybe also the traffic-from-another-daemon could be obvious - e.g. if there is traffic to |
3c4b203
to
67c6bce
Compare
67c6bce
to
1592a7d
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
Exactly. For all we know, the CI VM is firewalled off sufficiently that this can't happen, but we don't know.
Ah yes, as long as it's only some extra logging, having a manual factor in this is fine. It only becomes critical if network traffic would cause a CI failure. i'm not aware of a straightforward way to "log network traffic of this process and subproceses only". Yes, it could be done with a linux network namespace, but that's a lot of hassle.
Seeing this, it might already be namespaced. Though a process namespace doesn't necessarily mean the network namespace is isolated. |
1592a7d
to
071e43f
Compare
My intention here is to fail the CI because otherwise the log will be buried in the CI output and nobody will notice it. It follows that if this fails randomly with false positives when one would have to investigate it manually for arbitrary PRs which is highly highly highly undesirable. |
Ran this on my CI runner which has 8.8.8.8 configured as DNS server for docker. https://cirrus-ci.com/task/5500763260059648?logs=ci#L1137
Edit: My understanding is as follows: The DNS requests normally go to a local DNS resolver which then asks an upstream resolver. The upstream resolver (possibly your ISP) indirectly learns that you are running Bitcoin Core tests, even if there was no direct communication over a non-loopback interface. Edit 2: full tcpdump output here:
|
This comment was marked as abuse.
This comment was marked as abuse.
Concept ACK. Per https://bitcoin-irc.chaincode.com/bitcoin-core-dev/2024-11-26#1069602: "it turns out the owners of 1.2.3.4, 11.22.33.44 and 8.8.8.8, if they would bother, would know the IP address of every dev who runs the functional tests at home." |
071e43f
to
8799018
Compare
🚧 At least one of the CI tasks failed. HintsTry to run the tests locally, according to the documentation. However, a CI failure may still
Leave a comment here, if you need help tracking down a confusing failure. |
8799018
to
803ed46
Compare
About the false positives - I think it is worth trying this in its current mode where any detected traffic is assumed to have originated from the tests and fails the CI. If this ever fails the CI for another reason (false positive), then it would be easy to turn this into a "report in the logs only but don't fail" by removing the About the DNS traffic - I did |
3333d3f ci: Only pass documented env vars (MarcoFalke) Pull request description: The CI currently inherits almost all env vars from the host. This was problematic in the past and causing non-determinism, e.g. the fix in commit fa12558. It is still problematic today, see e.g. #31349 (comment), or #32935 This fixes #32935 by only passing env vars documented in `./ci/test/00_setup_env.sh`. Implementation-wise, instead of cramming the python code into the `python -c ""` statement, just start a fresh py file, which is easier to handle. ACKs for top commit: willcl-ark: ACK 3333d3f Tree-SHA512: f922e481a844128d7fbf773563278a3992c178ead60a3050eceb9ded2aad979afc815a5cbdb9f68494493c5d8d942cdd1111c21e32a5746d19505b87745cb84a
4652f75
to
f400e0b
Compare
|
f400e0b
to
9dcbf6c
Compare
9dcbf6c
to
a9ac49c
Compare
This PR originally contained a few fixes of tests that generated network traffic + a CI change to catch such future cases in CI. Then the tests fixes were moved to #31646 and merged. Then the activity here waned. Now there is a new case of network traffic generated by a test which went in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK a9ac49c
Thanks for the fix! Note that this fix may (I'm not sure) mask an integer overflow issue #32345 (comment) that seems like it is a real bug.
The new tcpdump mechanism to detect unexpected traffic is pretty simple and seems very nice. Another idea could be to have unit test setup use CreateSock
to throw errors if code attempts a remote connection to make errors a little easier to catch locally. But tcpdump solution is more general so seems like a good approach regardless.
src/test/node_init_tests.cpp
Outdated
@@ -32,6 +33,9 @@ class TestInit : public interfaces::Init | |||
|
|||
BOOST_AUTO_TEST_CASE(init_test) | |||
{ | |||
const auto create_sock_orig = CreateSock; | |||
CreateSock = [](int, int, int) { return std::make_unique<ZeroSock>(); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit "ci: detect outbound internet traffic generated while running tests" (f400e0b)
Thanks for the fix! I think instead of manipulating CreateSock for this specific test it would be probably better to add gArgs.ForceSetArg("-natpmp", "0");
to BasicTestingSetup::BasicTestingSetup
. Reasons:
- It would work generally for all tests, not just this one test
- This would make unit tests more consistent with functional tests
- CreateSock seems like a band-aid more than a cure. I think the problem here is the test trying to use the network, not just being able to use it after it tries.
If you prefer current approach, though, it seems ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about setting -natpmp
to off, but decided would be better to test with the default arguments as that more closely matches the real world. That argument seems weak.
band-aid more than a cure
I agree. Changed to what you suggest above.
a9ac49c
to
778675a
Compare
|
The test calls: `AppInitMain()` -> `StartMapPort()` -> `StartThreadMapPort()` -> `ThreadMapPort()` -> `ProcessPCP()` -> `PCPRequestPortMap()` -> `CreateSock()` and on the returned value from `CreateSock()` it calls the `Connect()` method. Thus, change `BasicTestingSetup::BasicTestingSetup()` to set `-natpmp` to 0. This way `node_init_tests/init_test` or other tests will not do network activity due to `ThreadMapPort()`.
778675a
to
151edfa
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code review ACK 151edfa. Just rebased since last review and replaced node_init_tests fix CreateSock fix with natpmp=0 fix
if [ -n "$out" ] ; then | ||
echo "Error: outbound TCP or UDP packets on the non loopback interface generated during $test_name tests:" >&2 | ||
tcpdump -n -r "$f" tcp or udp | ||
if [ -z "$INTERNET_TRAFFIC_EXPECTED" ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This INTERNET_TRAFFIC_EXPECTED
seems to be unused?
Prevent generating outbound traffic on a non-loopback interface during tests.
[1111:1111::1]:53
:previous releases, depends DEBUG
TSan, depends, gui
multiprocess, i686, DEBUG
no wallet, libbitcoinkernel
02_run_container.sh
:--cap-add NET_RAW
exit 1
line added to03_test_script.sh
in this PR.Summary of each VM wrt the new check:
32-bit CentOS, dash, gui
(Cirrus CI)ARM, unit tests, no functional tests
(Cirrus CI)ASan + LSan + UBSan + integer, no depends, USDT
(GitHub)MSan, depends
(Cirrus CI)TSan, depends, gui
(Cirrus CI)Win64 native fuzz, VS 2022
(GitHub)Win64 native, VS 2022
(GitHub)Win64-cross
(Cirrus CI)fuzzer,address,undefined,integer, no depends
(Cirrus CI)lint
(Cirrus CI)macOS 14 native, arm64, fuzz
(GitHub)macOS 14 native, arm64, no depends, sqlite only, gui
(GitHub)macOS-cross, gui, no tests
(Cirrus CI)multiprocess, i686, DEBUG
(Cirrus CI)no wallet, libbitcoinkernel
(Cirrus CI)previous releases, depends DEBUG
(Cirrus CI)test each commit
(GitHub)tidy
(Cirrus CI)✔️: test is run under tcpdump (it works!)
-: test is not run in that VM
1:
tcpdump: en0: You don't have permission to capture on that device
(does not work)2: tests are run but directly from
.github/workflows/ci.yml
instead of fromci/test/03_test_script.sh
(does not work)Resolves #31339