Skip to content

Conversation

jcstorms1
Copy link
Contributor

@jcstorms1 jcstorms1 commented Jul 9, 2025

What does this PR do?

Adds a plugin for the @azure/service-bus

  • Instruments the sendMessages method
  • Injects context into the message for distributed tracing

Adds instrumentation for the Service Bus Queue trigger in the Azure Functions Plugin

  • Extracts context from the message for proper parenting
image

Motivation

Customers expect full distributed tracing for Azure Functions. Unfortunately Azure function's output bindings did not allow us to inject context into the message. This led to the addition of the Azure Service Bus library.

Plugin Checklist

Additional Notes

Adds azureservicebusemulator and azuresqledge services to run Azure Service Bus locally and in CI for tests

@jcstorms1 jcstorms1 requested review from a team as code owners July 9, 2025 19:08
Copy link

github-actions bot commented Jul 9, 2025

Overall package size

Self size: 9.63 MB
Deduped: 109.35 MB
No deduping: 109.74 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------| | @datadog/libdatadog | 0.7.0 | 35.02 MB | 35.02 MB | | @datadog/native-appsec | 10.0.0 | 20.3 MB | 20.31 MB | | @datadog/native-iast-taint-tracking | 4.0.0 | 11.72 MB | 11.73 MB | | @datadog/pprof | 5.9.0 | 9.77 MB | 10.14 MB | | @opentelemetry/core | 1.30.1 | 908.66 kB | 7.16 MB | | protobufjs | 7.5.3 | 2.95 MB | 5.6 MB | | @datadog/wasm-js-rewriter | 4.0.1 | 2.85 MB | 3.58 MB | | @datadog/native-metrics | 3.1.1 | 1.02 MB | 1.43 MB | | @opentelemetry/api | 1.8.0 | 1.21 MB | 1.21 MB | | jsonpath-plus | 10.3.0 | 617.18 kB | 1.08 MB | | import-in-the-middle | 1.14.2 | 122.36 kB | 850.93 kB | | lru-cache | 10.4.3 | 804.3 kB | 804.3 kB | | source-map | 0.7.4 | 226 kB | 226 kB | | opentracing | 0.14.7 | 194.81 kB | 194.81 kB | | limiter | 3.0.0 | 157.92 kB | 157.92 kB | | pprof-format | 2.1.0 | 111.69 kB | 111.69 kB | | @datadog/sketches-js | 2.1.1 | 109.9 kB | 109.9 kB | | lodash.sortby | 4.7.0 | 75.76 kB | 75.76 kB | | ignore | 7.0.5 | 63.38 kB | 63.38 kB | | istanbul-lib-coverage | 3.2.2 | 34.37 kB | 34.37 kB | | rfdc | 1.4.1 | 27.15 kB | 27.15 kB | | @isaacs/ttlcache | 1.4.1 | 25.2 kB | 25.2 kB | | dc-polyfill | 0.1.9 | 25.11 kB | 25.11 kB | | tlhunter-sorted-set | 0.1.0 | 24.94 kB | 24.94 kB | | shell-quote | 1.8.3 | 23.74 kB | 23.74 kB | | retry | 0.13.1 | 18.85 kB | 18.85 kB | | semifies | 1.0.0 | 15.84 kB | 15.84 kB | | jest-docblock | 29.7.0 | 8.99 kB | 12.76 kB | | crypto-randomuuid | 1.0.0 | 11.18 kB | 11.18 kB | | ttl-set | 1.0.0 | 4.61 kB | 9.69 kB | | mutexify | 1.4.0 | 5.71 kB | 8.74 kB | | path-to-regexp | 0.1.12 | 6.6 kB | 6.6 kB | | koalas | 1.0.2 | 6.47 kB | 6.47 kB | | module-details-from-path | 1.0.4 | 3.96 kB | 3.96 kB |

🤖 This report was automatically generated by heaviest-objects-in-the-universe

Copy link

codecov bot commented Jul 9, 2025

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Project coverage is 81.84%. Comparing base (47d3c63) to head (483cef4).
Report is 67 commits behind head on master.

Files with missing lines Patch % Lines
...d-trace/src/service-naming/schemas/v0/messaging.js 0.00% 2 Missing ⚠️
...ages/datadog-instrumentations/src/helpers/hooks.js 0.00% 1 Missing ⚠️
packages/dd-trace/src/plugins/index.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6067      +/-   ##
==========================================
- Coverage   81.86%   81.84%   -0.02%     
==========================================
  Files         473      473              
  Lines       19444    19468      +24     
==========================================
+ Hits        15917    15934      +17     
- Misses       3527     3534       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pr-commenter
Copy link

pr-commenter bot commented Jul 9, 2025

Benchmarks

Benchmark execution time: 2025-07-11 19:33:53

Comparing candidate commit 483cef4 in PR branch storms/instrument-service-bus with baseline commit 47d3c63 in branch master.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1176 metrics, 52 unstable metrics.

span.setTag('messaging.system', 'servicebus')
span.setTag('messaging.destination.name', triggerEntity)
span.setTag('resource.name', 'serviceBusQueueTrigger')
span.setTag('span.kind', 'consumer')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

span.kind can be imported from ext/tags as well

SPAN_KIND: 'span.kind',

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing as your other comments. Other plugins are not using this file. Not sure why though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer strings, especially for tests where I want the expectation to be as obvious as possible without having to search for the underlying value.

const ServiceBusClient = obj.ServiceBusClient
shimmer.wrap(ServiceBusClient.prototype, 'createSender', createSender => function (queueName) {
const sender = createSender.apply(this, arguments)
shimmer.wrap(sender._sender, 'send', send => function (msg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When patching MessageSender.send are spans created in the following scenarios?

  • Synchronous message send (no await)
  • ServiceBusSender.sendMessages with a list of messages instead of a single message
  • ServiceBusSender.scheduleMessages

Copy link
Contributor Author

@jcstorms1 jcstorms1 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Azure service bus does not support synchronous message sending
  • A list did not work, because it was converted into a batch class and encoded
           // Case 2: Batch message
           batch = messages;
       }
       else {
           // Case 3: Array of messages
           batch = await this.createMessageBatch(options);
           for (const message of messages) {
               throwIfNotValidServiceBusMessage(message, errorInvalidMessageTypeSingleOrArray);
               if (!batch.tryAddMessage(message, options)) {
  • Did not try it, but based on the code I doubt it will work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batching and scheduling is non-trivial. We should revisit this later. I'll make cards for us to track them.

@jcstorms1 jcstorms1 marked this pull request as draft July 10, 2025 14:27
@duncanpharvey duncanpharvey self-requested a review July 10, 2025 20:43
@duncanpharvey duncanpharvey dismissed their stale review July 10, 2025 20:44

Will re-review following passing tests

@jcstorms1 jcstorms1 marked this pull request as ready for review July 11, 2025 15:36
Copy link
Member

@rochdev rochdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments but no dealbreakers, just a few things that could be improved.

One thing that I noticed also is that these plugins only have integration tests. These are usually only used to test ESM support right now. Was that done because only ESM is supported? Otherwise the tests could be moved to regular plugin tests which would run a lot faster.

const sender = createSender.apply(this, arguments)
shimmer.wrap(sender._sender, 'send', send => function (msg) {
const ctx = { sender, msg }
return producerStartCh.runStores(ctx, () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker, but for a standard promise instrumentation, it would have been possible to use tracingChannel.tracePromise() to replace this whole block.

span.setTag('messaging.system', 'servicebus')
span.setTag('messaging.destination.name', triggerEntity)
span.setTag('resource.name', 'serviceBusQueueTrigger')
span.setTag('span.kind', 'consumer')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally prefer strings, especially for tests where I want the expectation to be as obvious as possible without having to search for the underlying value.

@@ -35,11 +36,19 @@ class AzureFunctionsPlugin extends TracingPlugin {
'aas.function.trigger': mapTriggerTag(methodName)
}
}, false)
if (triggerMap[methodName] === 'ServiceBus') {
const triggerEntity = invocationContext.options.trigger.queueName || invocationContext.options.trigger.topicName
span.setTag('messaging.message_id', invocationContext.triggerMetadata.messageId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more efficient to prepare the meta object before startSpan depending on the trigger, and then pass that directly.

For example:

const meta = getMetaForTrigger(ctx)

const span = this.startSpan(this.operationName(), {
  childOf,
  service: this.serviceName(),
  type: 'serverless',
  meta
}, false)

function getMetaForTrigger({ functionName, methodName, invocationContext }) {
  const meta = {
    'aas.function.name': functionName,
    'aas.function.trigger': mapTriggerTag(methodName)
  }

  if (triggerMap[methodName] === 'ServiceBus') {
    ...
  }

  return meta
}

@@ -35,11 +36,19 @@ class AzureFunctionsPlugin extends TracingPlugin {
'aas.function.trigger': mapTriggerTag(methodName)
}
}, false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated to this PR, but passing ctx here instead of false would remove the need to explicitly assign currentStore and parentStore to the context at the end of the function.

@@ -75,4 +89,13 @@ function mapTriggerTag (methodName) {
return triggerMap[methodName] || 'Unknown'
}

function extractTraceContext (tracer, ctx) {
switch (String(triggerMap[ctx.methodName])) {
case 'Http':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope of this PR but in the future I think it might be worth it to split things as multiple plugins instead of having an ever-growing switch.

Comment on lines 23 to 25
sandbox = await createSandbox([
`@azure/functions@${version}`, 'azure-functions-core-tools@4', '@azure/service-bus@7.9.2'], false,
['./packages/datadog-plugin-azure-functions/test/integration-test/fixtures/*'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sandbox = await createSandbox([
`@azure/functions@${version}`, 'azure-functions-core-tools@4', '@azure/service-bus@7.9.2'], false,
['./packages/datadog-plugin-azure-functions/test/integration-test/fixtures/*'])
sandbox = await createSandbox([
`@azure/functions@${version}`,
'azure-functions-core-tools@4',
'@azure/service-bus@7.9.2'
],
false,
['./packages/datadog-plugin-azure-functions/test/integration-test/fixtures/*'])

assert.strictEqual(payload[0][0].parent_id, payload[1][1].span_id)
assert.strictEqual(payload[1][0].span_id, payload[1][1].parent_id)
})
}).timeout(50000)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 50 seconds? This seems very specific.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked @duncanpharvey about this before. He said it takes a while to start up the azure function runtime and 50 seconds gave enough cushion for the tests to pass reliably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess I was just curious why 50 seconds and not a minute at that point 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 50 second timeout was arbitrary

@jcstorms1
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Jul 14, 2025

View all feedbacks in Devflow UI.

2025-07-14 13:09:57 UTC ℹ️ Start processing command /merge


2025-07-14 13:10:01 UTC ℹ️ MergeQueue: queue is disabled

Added to the queue but the mergequeue is not enabled for now.


2025-07-14 13:17:49 UTC ⚠️ MergeQueue: This merge request was unqueued

jordan.storms@datadoghq.com unqueued this merge request

@jcstorms1
Copy link
Contributor Author

/remove

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Jul 14, 2025

View all feedbacks in Devflow UI.

2025-07-14 13:17:38 UTC ℹ️ Start processing command /remove


2025-07-14 13:17:40 UTC ℹ️ Devflow: /remove

@jcstorms1
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Jul 14, 2025

View all feedbacks in Devflow UI.

2025-07-14 15:10:45 UTC ℹ️ Start processing command /merge


2025-07-14 15:10:56 UTC ℹ️ MergeQueue: queue is disabled

Added to the queue but the mergequeue is not enabled for now.


2025-07-14 15:15:45 UTC ⚠️ MergeQueue: This merge request was unqueued

jordan.storms@datadoghq.com unqueued this merge request since it was merged manually

@jcstorms1
Copy link
Contributor Author

/remove

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Jul 14, 2025

View all feedbacks in Devflow UI.

2025-07-14 15:15:26 UTC ℹ️ Start processing command /remove


2025-07-14 15:15:34 UTC ℹ️ Devflow: /remove

@jcstorms1 jcstorms1 merged commit c2a1f8c into master Jul 14, 2025
682 of 683 checks passed
@jcstorms1 jcstorms1 deleted the storms/instrument-service-bus branch July 14, 2025 15:15
ghost pushed a commit that referenced this pull request Jul 22, 2025
* initial

* plugin loading

* full distributed trace

* all span attributes set correctly

* add consumer tests

* update test

* finalize service bus test

* remove qpid from service bus test

* instrument and test topics

* remove localhost from connectionstring

* lint

* try removing localhost

* revert localhost removal

* try to use newer node versions

* revert node versioning for integration tests

* fix azure-functions test

* fix sql server & lint

* try actual name of sql service

* clean up azure functions

* formatting

* lint
@ghost ghost mentioned this pull request Jul 22, 2025
watson pushed a commit that referenced this pull request Jul 24, 2025
* initial

* plugin loading

* full distributed trace

* all span attributes set correctly

* add consumer tests

* update test

* finalize service bus test

* remove qpid from service bus test

* instrument and test topics

* remove localhost from connectionstring

* lint

* try removing localhost

* revert localhost removal

* try to use newer node versions

* revert node versioning for integration tests

* fix azure-functions test

* fix sql server & lint

* try actual name of sql service

* clean up azure functions

* formatting

* lint
tlhunter pushed a commit that referenced this pull request Aug 22, 2025
* initial

* plugin loading

* full distributed trace

* all span attributes set correctly

* add consumer tests

* update test

* finalize service bus test

* remove qpid from service bus test

* instrument and test topics

* remove localhost from connectionstring

* lint

* try removing localhost

* revert localhost removal

* try to use newer node versions

* revert node versioning for integration tests

* fix azure-functions test

* fix sql server & lint

* try actual name of sql service

* clean up azure functions

* formatting

* lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants