Set an upper bound for the number of timeline events gathered #6089

szegedi · 2025-07-14T14:53:44Z

What does this PR do?

Establishes an upper limit on the number of non-CPU timeline events that can be gathered within a single upload period.

Motivation

It's a bad idea for it to be unbounded.

Additional Notes

The upper limit is rather conservative and is heuristically derived from the CPU sampling rate, the upload period, and the number of libuv worker threads (and available parallelism.) E.g. if the CPU sampling rate is 10ms and we upload every 60s, that means at most 6000 CPU samples. We'd like to be able to observe oversubscription on the libuv thread pool. If there are 4 libuv threads, we'd like to be able to observe at least 6000*5 events (events executing on at least 4 threads + 1 waiting for detecting oversubscription). Finally, another 6000 events are allowed for GC events so that gives us uploadPeriod / samplingPeriod * (threads + 2)) as the maximum number of events we wish to sample. With our defaults of 65s upload period, 99Hz sampling rate, and 4 libuv threads we'll collect at most 38613 events (when more are gathered, events get randomly discarded.) Because we already employ sampling I believe this upper bound is not very limiting in ordinary usage – its purpose is to prevent pathological memory usage runoff.

Jira: PROF-12127

codecov · 2025-07-14T14:54:46Z

Codecov Report

❌ Patch coverage is 98.03922% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.14%. Comparing base (0852a41) to head (422ea7e).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...ackages/dd-trace/src/profiling/profilers/events.js	96.77%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6089      +/-   ##
==========================================
+ Coverage   82.80%   83.14%   +0.33%     
==========================================
  Files         476      477       +1     
  Lines       19661    19692      +31     
==========================================
+ Hits        16281    16372      +91     
+ Misses       3380     3320      -60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-07-14T14:55:32Z

Overall package size

Self size: 11.12 MB
Deduped: 110.7 MB
No deduping: 111.09 MB

Dependency sizes

| name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.0.1 | 20.3 MB | 20.3 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.14 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | dc-polyfill | 0.1.10 | 26.73 kB | 26.73 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | limiter | 1.1.5 | 23.17 kB | 23.17 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

_{🤖 This report was automatically generated by heaviest-objects-in-the-universe}

pr-commenter · 2025-07-14T15:01:02Z

Benchmarks

Benchmark execution time: 2025-07-29 11:39:38

Comparing candidate commit 422ea7e in PR branch szegedi/timeline-event-max with baseline commit 0852a41 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1272 metrics, 51 unstable metrics.

…a single flush period

watson

I'm not a fan of the test event code added to packages/dd-trace/src/profiling/profilers/events.js. Is there really no good way to make this test without littering the source code with code only meant for testing?

packages/dd-trace/src/profiling/profilers/events.js

integration-tests/profiler/profiler.spec.js

szegedi · 2025-07-21T08:35:10Z

I'm not a fan of the test event code added to packages/dd-trace/src/profiling/profilers/events.js. Is there really no good way to make this test without littering the source code with code only meant for testing?

@watson I did actually think about this quite a lot before committing to this implementation (and discussed it with folks in my team, but happy to discuss it with you as well.) I considered writing a test that'll generate events by doing a bunch of legitimate, currently captured events: either filesystem operations on a temporary file, or a bunch of DNS lookups. My main issue with that approach is that they involve IO operations (specifically, IO operations executed on the libuv thread pool) and I was worried that the test could become flaky, especially when run on the traditionally underresourced macOS platform. Having a way to directly inject events gated behind an env var felt like the lesser evil.

Some other alternatives I considered were:

make a diagnostic channel the main way of adding events even internally, have everything work off of it, and then it wouldn't be a test-only feature, but it felt a bit convoluted to add a pub-sub channel into control flow where not needed.
make it so that the test events have their own type instead of being able to inject any kind of event, but then it would've been even more code, and since the functionality is gated behind an env var it wouldn't be increasing safety.

packages/dd-trace/src/profiling/profilers/events.js

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

…ventsProfiler

szegedi · 2025-07-22T14:15:01Z

@watson I managed to replace the integration test with a unit test that needs no extra testing support in shipped code.

watson · 2025-07-22T14:34:52Z

packages/dd-trace/src/profiling/profilers/events.js

+      // current sample too to be the dropped one, with the equal likelihood as
+      // any other sample.
+      const replacementIndex = Math.floor(Math.random() * (this.maxSamples + 1)) - 1
+      if (replacementIndex !== -1) {


Do I understand the algorithm correct, that replacementIndex will only ever be -1 if maxSamples is 0? If so wouldn't it be simpler to make the wrapping else block into an else if (this.maxSamples !== 0) block and then get rid of this if block completely?

Then I think you could also simplify the random algorithm to rnd(max), right?

No, that's not the intent. The idea is as follows: I have m samples in the array already which is the maximum (maxSamples in the code.) I get one more so now I have m+1 samples and I need to discard one to get back to m samples. I could just throw away a sample from the array but that favorably treats the new sample – it is not considered to be discarded. It is fairer if the new sample is also a discard candidate so the algorithm doesn't have a bias towards newer samples (I want a random distribution over the entire sampled period.) So I need a random number between [0, m+1) instead of [0, m). The idea is that if the generated random number is in [0, m) I throw away one of samples already in the array at that index. If the number is m though, I throw away the newly arrived sample.

I found it more elegant to implement this logic by subtracting 1 from the generated random value, so now I get a number that is in [-1, m) and instead of comparing to m I can compare to -1 instead. An alternative but equivalent solution would be:

const replacementIndex = Math.floor(Math.random() * (this.maxSamples + 1)) if (replacementIndex !== this.maxSamples) {

nit: hum, that does not look right:
the last added event will have a probability of being kept of 1 - 1 / (maxSamples+1) whereas it should be (for a true random sampling): maxSamples/n (where n is the total number of events).
What you need here is reservoir sampling (where n is the total number of processed events, including the current one):

const replacementIndex = Math.floor(Math.random() * n) if (replacementIndex < this.maxSamples) {

you're right, of course; not sure what I was thinking with my bespoke probability calculation. Fixed.

datadog-datadog-prod-us1 · 2025-07-24T12:24:51Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 422ea7e | Docs | Was this helpful? Give us feedback!}

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

szegedi added profiling semver-minor labels Jul 14, 2025

szegedi force-pushed the szegedi/timeline-event-max branch 2 times, most recently from e18818a to 16c0275 Compare July 14, 2025 14:59

szegedi added 3 commits July 14, 2025 17:17

libuv-size

7cf7dfb

Set default sampling interval in millis in config

aebfedd

Set an upper bound for the number of timeline events gathered within …

bf17096

…a single flush period

szegedi force-pushed the szegedi/timeline-event-max branch 2 times, most recently from 6c0b312 to 672e321 Compare July 14, 2025 15:21

Testing

23f4c49

szegedi force-pushed the szegedi/timeline-event-max branch from 672e321 to 23f4c49 Compare July 14, 2025 15:38

szegedi marked this pull request as ready for review July 14, 2025 15:49

szegedi requested a review from a team as a code owner July 14, 2025 15:49

szegedi enabled auto-merge (squash) July 14, 2025 15:49

Merge branch 'master' into szegedi/timeline-event-max

b1b5bd6

szegedi disabled auto-merge July 16, 2025 13:57

watson reviewed Jul 17, 2025

View reviewed changes

packages/dd-trace/src/profiling/profilers/events.js Outdated Show resolved Hide resolved

packages/dd-trace/src/profiling/profilers/events.js Outdated Show resolved Hide resolved

integration-tests/profiler/profiler.spec.js Outdated Show resolved Hide resolved

szegedi and others added 6 commits July 21, 2025 10:47

Interpret options.samplingInterval consistently

b37a4e0

Review suggestions

b94fe24

Remove CompositeEventSource

ec708c9

review suggestions fix

5fed672

Use private fields on the exported class

4a739cf

Merge branch 'master' into szegedi/timeline-event-max

ae052ec

watson reviewed Jul 21, 2025

View reviewed changes

packages/dd-trace/src/profiling/profilers/events.js Outdated Show resolved Hide resolved

packages/dd-trace/src/profiling/profilers/events.js Outdated Show resolved Hide resolved

szegedi and others added 2 commits July 21, 2025 16:32

Apply suggestions from code review

292c308

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

Replace integration test with unit test, remove test-only code from E…

62d7f75

…ventsProfiler

szegedi force-pushed the szegedi/timeline-event-max branch from 899a38f to 62d7f75 Compare July 22, 2025 12:31

szegedi requested a review from watson July 22, 2025 14:14

watson reviewed Jul 22, 2025

View reviewed changes

szegedi mentioned this pull request Jul 23, 2025

chore: upgrade eslint-plugin-unicorn to v60.0.0 #6148

Merged

szegedi requested a review from watson July 23, 2025 15:40

Merge branch 'master' into szegedi/timeline-event-max

e7004d1

szegedi and others added 2 commits July 24, 2025 14:46

Fixes

86b2d3d

Merge branch 'master' into szegedi/timeline-event-max

85fba8d

szegedi requested a review from nsavoire July 24, 2025 15:50

szegedi and others added 2 commits July 29, 2025 13:26

Use proper reservoir sampling

2f7d330

Merge branch 'master' into szegedi/timeline-event-max

422ea7e

nsavoire approved these changes Jul 29, 2025

View reviewed changes

szegedi merged commit 33e1c64 into master Jul 29, 2025
688 checks passed

szegedi deleted the szegedi/timeline-event-max branch July 29, 2025 13:09

dd-octo-sts bot pushed a commit that referenced this pull request Jul 30, 2025

Set an upper bound for the number of timeline events gathered (#6089)

5e9d47f

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

dd-octo-sts bot mentioned this pull request Jul 30, 2025

v5.62.0 proposal #6200

Merged

BridgeAR pushed a commit that referenced this pull request Aug 6, 2025

Set an upper bound for the number of timeline events gathered (#6089)

f322e19

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

tlhunter pushed a commit that referenced this pull request Aug 22, 2025

Set an upper bound for the number of timeline events gathered (#6089)

30a5fd1

Co-authored-by: Thomas Watson <thomas.watson@datadoghq.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set an upper bound for the number of timeline events gathered #6089

Set an upper bound for the number of timeline events gathered #6089

Uh oh!

szegedi commented Jul 14, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Jul 14, 2025 •

edited

Loading

Uh oh!

watson left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szegedi commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

szegedi commented Jul 22, 2025

Uh oh!

watson Jul 22, 2025

Uh oh!

szegedi Jul 23, 2025 •

edited

Loading

Uh oh!

nsavoire Jul 28, 2025

Uh oh!

szegedi Jul 29, 2025

Uh oh!

datadog-datadog-prod-us1 bot commented Jul 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Set an upper bound for the number of timeline events gathered #6089

Set an upper bound for the number of timeline events gathered #6089

Uh oh!

Conversation

szegedi commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

Uh oh!

codecov bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overall package size

Uh oh!

pr-commenter bot commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

watson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szegedi commented Jul 21, 2025

Uh oh!

Uh oh!

Uh oh!

szegedi commented Jul 22, 2025

Uh oh!

watson Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

szegedi Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsavoire Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

szegedi Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

datadog-datadog-prod-us1 bot commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szegedi commented Jul 14, 2025 •

edited

Loading

codecov bot commented Jul 14, 2025 •

edited

Loading

github-actions bot commented Jul 14, 2025 •

edited

Loading

pr-commenter bot commented Jul 14, 2025 •

edited

Loading

szegedi Jul 23, 2025 •

edited

Loading

datadog-datadog-prod-us1 bot commented Jul 24, 2025 •

edited

Loading