-
Notifications
You must be signed in to change notification settings - Fork 37.8k
ci: run USDT interface tests in the CI #25528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This makes sure to NOT hook into other bitcoind binaries run in paralell in the test framework. We only want to trace the intended binary. In interface_usdt_utxocache.py: While testing the utxocache flush with pruning, bitcoind is restarted and we need to hook into the new PID again.
The USDT interface exposes process internals via the tracepoints. This means, the USDT interface tests somewhat awardly depend on these internals. If internals change, the tests have to adopt to that change. Previously, the USDT interface tests weren't run in the CI so changes could break the USDT interface tests without being noticed (e.g. bitcoin#25486). In fa13375 a 'self.rescan_utxos()' call was added in the 'generate()' function of the test framework. 'rescan_utxos()' causes the UTXO cache to be flushed. In the USDT interface tests for the 'utxocache:flush' trancepoint, 'generate()' is used. As the utxo cache is now flushed more often, the number of flushes the tests expectes need to be adopted. Also, the utxo cache has now a different size when being flushed. The utxocache tracepoint is tested by shutting the node down and pruning blocks, to test the 'for_prune' argument. Changes: - A list 'expected_flushes' is now used which contains 'mode', 'for_prune', and 'size' for each expected flush. - When a flush happens, the expected-flush is removed from the list. This list is checked to be empty (unchanged). - Previously, shutting down caused these two flushes: UTXOCacheFlush(duration=*, mode=ALWAYS, size=104, memory=*, for_prune=False) UTXOCacheFlush(duration=*, mode=ALWAYS, size=0, memory=*, for_prune=False) now it causes these flushes: UTXOCacheFlush(duration=*, mode=ALWAYS, size=2, memory=*, for_prune=False) UTXOCacheFlush(duration=*, mode=ALWAYS, size=0, memory=*, for_prune=False) The 104 UTXOs flushed previously were mainly coinbase UTXOs generated in previous tests and the test setup. These are now already flushed. - In the 'for_prune' test we previously hooked into the tracepoint before mining blocks. This changed to only get notified about the tracepoint being triggered for the prune. Here, the utxo cache is empty already as it has just been flushed in 'generate()'. old: UTXOCacheFlush(duration=*, mode=NONE, size=350, memory=*, for_prune=True) new: UTXOCacheFlush(duration=*, mode=NONE, size=0, memory=*, for_prune=True)
4c88ee2
to
a221a5b
Compare
I tried to experiment with this, but I ran into: See https://cirrus-ci.com/task/6300898480619520?logs=ci#L2025 With the only change being to switch to debian:bookworm, see maflcko/bitcoin-core-with-ci@803f365#diff-b37d362747ecdb36dbfcbe2b8aab779f06404eee471ad835b58355e353d4fe79R9 Any ideas? |
Interesting. This seems to be unrelated to the tests. More the tracepoint macros in general. Probably a problem with the Systemtap package ( I'll also do some poking. edit: the variadic marco arg was indeed not present in 4.5 (which we used previously): https://sourceware.org/git/?p=systemtap.git;a=commitdiff;h=ecab2afea46099b4e7dfd551462689224afdbe3a. I think it also has something to do with clang, |
Tried running a depends build on bookworm - but no success so far: https://cirrus-ci.com/task/5255417889554432?logs=ci#L570 |
In any case it would fail because it wouldn't be possible to install the kernel headers in docker: https://cirrus-ci.com/task/5345925198512128?logs=ci#L3567 ? |
One way to get the hosts kernel headers into the docker container is to mount them. I've mounted
which is in |
Makes sense. That doesn't seem to be worth the effort. An alternative to using a private ppa would be to bump the OS version in To preserve CI resources, I am thinking about combining this with another task. Since a wallet would be good to also cover the coinselection traces, and a recent-ish OS is required, we could re-use one of the sanitizer builds or the i686 build? |
It worked. This commit should be passing: https://cirrus-ci.com/build/4705943798677504 |
Sorry, I lied. This one should pass: https://cirrus-ci.com/build/4898814673813504 |
d97474f
to
cc0b8bb
Compare
Our CI tasks are run by CirrusCI in Docker containers in a Google Compute Engine based Kubernetes environment. These containers have limited capabilities - especially CAP_SYS_ADMIN is missing. See bitcoin#23296 (comment) We need elevated privileges to hook into the USDT tracepoints. We use a CirrusCI "compute_engine_instance" (a VM, not a container) where we have the required privileges. The ubunut-mininmal-2204-lts was choosen with debian-11 being an alternative. Both pack an outdated 'bpfcc-tools' package (v0.18.0) from 2020. This version prints warnings to stderr during BPF bytecode compilation, which causes our functional test runner to fail. This is fixed in newer verison. Until debian-12 or a newer Ubuntu release is avaliable as image in GCE (https://cloud.google.com/compute/docs/images/os-details), we use a third-party and untrusted PPA that releases up-to-date versions of the package. The official iovisor (authors of BCC) PPA is outdated too. An alternative would be to compile BCC from source in the CI. Co-authored-by: MacroFake <falke.marco@gmail.com>
cc0b8bb
to
cc7335e
Compare
As discussed offline with MarcoFake, we're using the |
cr ACK cc7335e |
# Images can be found here: https://cloud.google.com/compute/docs/images/os-details | ||
compute_engine_instance: | ||
image_project: ubuntu-os-cloud | ||
image: family/ubuntu-2204-lts # when upgrading, check if we can drop "ADD_UNTRUSTED_BPFCC_PPA" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to the future: I was able to change this to family/ubuntu-2110
, but the gce docs only include lts version. So if 2210
has the updated package, we can probably switch to that: https://packages.ubuntu.com/kinetic/bpfcc-tools
Concept ACK |
# packages. Meanwhile, use an untrusted PPA to install an up-to-date version of the bpfcc-tools | ||
# package. | ||
# TODO: drop this once we can use newer images in GCE | ||
CI_EXEC add-apt-repository ppa:hadret/bpfcc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using of add-apt-repository
requires some packages been installed.
As mentioned in bitcoin#26571, the task running the USDT interface tests fail when run in docker. cc7335e in bitcoin#25528 added that the tests are run in a **VM** in Cirrus CI. Running them locally in docker containers might not work: - We use [bcc] as tracing toolkit which requires the kernel headers to compile the BPF bytecode. As docker containers use the hosts kernel and don't run their own, there is a potential for mismatches between kernel headers available in the container and the host kernel. This results in a failure loading the BPF byte code. - Privilges are required to load the BPF byte code into the kernel. Normally, the docker containers aren't run with these. - We currently use an untrusted third-party PPA to install the bpfcc-tools package on Ubuntu 22.04. Using this on a local dev system could be a security risk. To not hinder the ASan + LSan + UBSan part of the CI task, the USDT tests are disabled on non-CirrusCI runs. [bcc]: https://github.com/iovisor/bcc
2811f40 ci: only run USDT interface tests on CirrusCI (0xb10c) Pull request description: As mentioned in #26571, the task running the USDT interface tests fail when run in docker. cc7335e in #25528 added that the tests are run in a **VM** in Cirrus CI. Running them locally in docker containers might not work: - We use [bcc] as tracing toolkit which requires the kernel headers to compile the BPF bytecode. As docker containers use the hosts kernel and don't run their own, there is a potential for mismatches between kernel headers available in the container and the host kernel. This results in a failure loading the BPF byte code. - Privilges are required to load the BPF byte code into the kernel. Normally, the docker containers aren't run with these. - We currently use an untrusted third-party PPA to install the bpfcc-tools package on Ubuntu 22.04. Using this on a local dev system could be a security risk. To not hinder the ASan + LSan + UBSan part of the CI task, the USDT tests are disabled on non-CirrusCI runs. [bcc]: https://github.com/iovisor/bcc Top commit has no ACKs. Tree-SHA512: 7c159b59630b36d5ed927b501fa0ad6f54ff3d98d0902f975cc8122b4147a7f828175807d40a470a85f0f3b6eeb79307d3465a287dbd2f3d75b803769ad0b6e7
Summary: ``` This adds functional tests for the USDT tracepoints added in #22006 and #22902. This partially fixes #23296. The tests are probably skipped on most systems as these tests require: a Linux system with a kernel that supports BPF (and available kernel headers) that Bitcoin Core is compiled with tracepoints for USDT support (default when compiled with depends) bcc installed the tests are run with a privileged user that is able to e.g. do BPF syscalls and load BPF maps ``` Backport of [[bitcoin/bitcoin#24358 | core#24358]] and [[bitcoin/bitcoin#25528 | core#25528]] for the test fixes. Depends on D12737. Note that: - We don't run it on CI (yet) - We need to prune more blocks to trigger the expected flush event as our block files are smaller - The 3 tests succeed on my machine, but the runner returns an error because I got compilation warnings that print to stderr Test Plan: With tracing enabled: sudo ./test/functional/test_runner.py interface_usdt_net sudo ./test/functional/test_runner.py interface_usdt_utxocache sudo ./test/functional/test_runner.py interface_usdt_validation ninja check-functional Check the tests are skipped as expected when not run as root. Reviewers: #bitcoin_abc, PiRK Reviewed By: #bitcoin_abc, PiRK Subscribers: PiRK Differential Revision: https://reviews.bitcoinabc.org/D12739
@@ -15,5 +15,5 @@ export RUN_FUNCTIONAL_TESTS=false | |||
export RUN_FUZZ_TESTS=false | |||
export RUN_TIDY=true | |||
export GOAL="install" | |||
export BITCOIN_CONFIG="CC=clang CXX=clang++ --with-incompatible-bdb --disable-hardening CFLAGS='-O0 -g0' CXXFLAGS='-O0 -g0'" | |||
export BITCOIN_CONFIG="CC=clang CXX=clang++ --enable-c++20 --with-incompatible-bdb --disable-hardening CFLAGS='-O0 -g0' CXXFLAGS='-O0 -g0'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the point to add --enable-c++20
here?
Asking in context of #26763 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't added in this pull, see the diff in ci/test/00_setup_env_native_asan.sh
.
It was added to check that compiling with it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could another CI task use it instead of the "tidy" one? (fwiw, we have a native Windows build using C++20)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, any Linux task that builds all targets (including fuzz and gui) should do. For example, multiprocess, asan, or tsan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't added in this pull, see the diff in
ci/test/00_setup_env_native_asan.sh
.
Suggesting to revert the move of --enable-c++20
between CI tasks in #26770.
…sk back to "ASan..." one afc6052 ci: Move `--enable-c++20` from "tidy" task back to "ASan..." one (Hennadii Stepanov) Pull request description: This PR reverts cc7335e from bitcoin/bitcoin#25528 partially. C++20 has introduced some new headers, and it is premature to consider them when using the IWYU tool. Required for bitcoin/bitcoin#26763 and bitcoin/bitcoin#26766. Related discussions: - bitcoin/bitcoin#25528 (comment) - bitcoin/bitcoin#26763 (comment) ACKs for top commit: MarcoFalke: review only ACK afc6052 Tree-SHA512: 9c1d3317d0f7a94d46d1e442da1a91669cd9ed1e62a41579edc96e1f5447551b536d32eeb222caf277e85228482753be545e0a874208cb24f9a8491fce3b6d9f
Changes a CI task that runs test the previously not run
test/functional/interface_usdt_*.py
functional tests (added in #24358).This task is run as CirussCI
compute_engine_instance
VM as hooking into the tracepoints is not possible in CirrusCI docker containers (#23296 (comment)). We use an unoffical PPA and untrustedbpfcc-tools
package in the CI as the Ubuntu jammy and Debian bullseye packages are outdated. We hope use an official package when new Ubuntu/Debian releases are available for the use with Google Compute Engine.We make sure to hook into
bitcoind
binaries in USDT interface tests via their PID, instead of their path. This makes sure multiple functional tests running in parallel don't interfere with each other.The utxocache USDT interface tests is adopted to a change of the functional test framework that wasn't detected as the tests weren't run in the CI. As the tracepoints expose internals, it can happen that we need to adopt the interface test when internals change. This is a bit awkward, and if it happens to frequently, we should consider generalizing the tests a bit more. For now it's fine, I think.
See the individual commit messages for more details on the changes.
Fixes #24782
Fixes #23296
I'd like to hear from reviewers:
hadret/bpfcc
PPA for now? There is a clear plan when to drop it and as is currently, it could only impact the newly added VM task.Adding a new task increases CI runtime and costs. Should an existingYes, see ci: run USDT interface tests in the CI #25528 (comment)container
CI task be ported to a VM and reused instead?