-
Notifications
You must be signed in to change notification settings - Fork 345
Set an upper bound for the number of timeline events gathered #6089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6089 +/- ##
==========================================
+ Coverage 82.80% 83.14% +0.33%
==========================================
Files 476 477 +1
Lines 19661 19692 +31
==========================================
+ Hits 16281 16372 +91
+ Misses 3380 3320 -60 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overall package sizeSelf size: 11.12 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.0.1 | 20.3 MB | 20.3 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.14 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |🤖 This report was automatically generated by heaviest-objects-in-the-universe |
e18818a
to
16c0275
Compare
BenchmarksBenchmark execution time: 2025-07-29 11:39:38 Comparing candidate commit 422ea7e in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 1272 metrics, 51 unstable metrics. |
6c0b312
to
672e321
Compare
672e321
to
23f4c49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the test event code added to packages/dd-trace/src/profiling/profilers/events.js
. Is there really no good way to make this test without littering the source code with code only meant for testing?
@watson I did actually think about this quite a lot before committing to this implementation (and discussed it with folks in my team, but happy to discuss it with you as well.) I considered writing a test that'll generate events by doing a bunch of legitimate, currently captured events: either filesystem operations on a temporary file, or a bunch of DNS lookups. My main issue with that approach is that they involve IO operations (specifically, IO operations executed on the libuv thread pool) and I was worried that the test could become flaky, especially when run on the traditionally underresourced macOS platform. Having a way to directly inject events gated behind an env var felt like the lesser evil. Some other alternatives I considered were:
|
Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>
899a38f
to
62d7f75
Compare
@watson I managed to replace the integration test with a unit test that needs no extra testing support in shipped code. |
// current sample too to be the dropped one, with the equal likelihood as | ||
// any other sample. | ||
const replacementIndex = Math.floor(Math.random() * (this.maxSamples + 1)) - 1 | ||
if (replacementIndex !== -1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand the algorithm correct, that replacementIndex
will only ever be -1
if maxSamples
is 0
? If so wouldn't it be simpler to make the wrapping else
block into an else if (this.maxSamples !== 0)
block and then get rid of this if
block completely?
Then I think you could also simplify the random algorithm to rnd(max)
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's not the intent. The idea is as follows: I have m samples in the array already which is the maximum (maxSamples
in the code.) I get one more so now I have m+1 samples and I need to discard one to get back to m samples. I could just throw away a sample from the array but that favorably treats the new sample – it is not considered to be discarded. It is fairer if the new sample is also a discard candidate so the algorithm doesn't have a bias towards newer samples (I want a random distribution over the entire sampled period.) So I need a random number between [0, m+1) instead of [0, m). The idea is that if the generated random number is in [0, m) I throw away one of samples already in the array at that index. If the number is m though, I throw away the newly arrived sample.
I found it more elegant to implement this logic by subtracting 1 from the generated random value, so now I get a number that is in [-1, m) and instead of comparing to m I can compare to -1 instead. An alternative but equivalent solution would be:
const replacementIndex = Math.floor(Math.random() * (this.maxSamples + 1))
if (replacementIndex !== this.maxSamples) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hum, that does not look right:
the last added event will have a probability of being kept of 1 - 1 / (maxSamples+1)
whereas it should be (for a true random sampling): maxSamples/n
(where n is the total number of events).
What you need here is reservoir sampling (where n
is the total number of processed events, including the current one):
const replacementIndex = Math.floor(Math.random() * n)
if (replacementIndex < this.maxSamples) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right, of course; not sure what I was thinking with my bespoke probability calculation. Fixed.
Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>
Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>
Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>
What does this PR do?
Establishes an upper limit on the number of non-CPU timeline events that can be gathered within a single upload period.
Motivation
It's a bad idea for it to be unbounded.
Additional Notes
The upper limit is rather conservative and is heuristically derived from the CPU sampling rate, the upload period, and the number of libuv worker threads (and available parallelism.) E.g. if the CPU sampling rate is 10ms and we upload every 60s, that means at most 6000 CPU samples. We'd like to be able to observe oversubscription on the libuv thread pool. If there are 4 libuv threads, we'd like to be able to observe at least 6000*5 events (events executing on at least 4 threads + 1 waiting for detecting oversubscription). Finally, another 6000 events are allowed for GC events so that gives us
uploadPeriod / samplingPeriod * (threads + 2))
as the maximum number of events we wish to sample. With our defaults of 65s upload period, 99Hz sampling rate, and 4 libuv threads we'll collect at most 38613 events (when more are gathered, events get randomly discarded.) Because we already employ sampling I believe this upper bound is not very limiting in ordinary usage – its purpose is to prevent pathological memory usage runoff.Jira: PROF-12127