Skip to content

Conversation

AlexanderEllis
Copy link
Contributor

@AlexanderEllis AlexanderEllis commented Mar 10, 2022

Commit Message: Add OpenTelemetry Tracer Extension for exporting OTLP traces
Additional Description: This adds a new OpenTelemetry Tracer that exports OTLP traces via Envoy's async gRPC mechanism. It includes traceparent header parsing, buffering, and basic stats.
Risk Level: Medium (new extension that is not enabled)
Testing: Added unit tests for the new tracer and performed manual testing with the OpenTelemetry Collector
Docs Changes: TODO
Release Notes: opentelemetry: add OpenTelemetry tracer extension
Platform Specific Features: N/A
Part of #9958

Possible fix for #20641

Alex Ellis added 16 commits March 9, 2022 19:25
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
@repokitteh-read-only
Copy link

Hi @AlexanderEllis, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #20281 was opened by AlexanderEllis.

see: more, trace.

@repokitteh-read-only
Copy link

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #20281 was opened by AlexanderEllis.

see: more, trace.

@repokitteh-read-only repokitteh-read-only bot added api deps Approval required for changes to Envoy's external dependencies labels Mar 10, 2022
@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @lizan
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).
envoyproxy/dependency-shepherds assignee is @phlax

🐱

Caused by: #20281 was opened by AlexanderEllis.

see: more, trace.

@moderation
Copy link
Contributor

Fantastic to see this! As part of this change can you update to the latest version of OpenTelemetry. We are currently on version 0.11.0 - the latest is 0.14.0

https://github.com/envoyproxy/envoy/blob/main/api/bazel/repository_locations.bzl#L99-L109

Alex Ellis added 5 commits March 10, 2022 17:36
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <alexellis@google.com>
Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
@yanavlasov
Copy link
Contributor

/retest

@repokitteh-read-only
Copy link

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #20281 (comment) was created by @yanavlasov.

see: more, trace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurray!

@repokitteh-read-only repokitteh-read-only bot removed api deps Approval required for changes to Envoy's external dependencies labels Jun 22, 2022
Copy link
Member

@phlax phlax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks so much @AlexanderEllis for staying the journey

landing cos i was asked, and im drunk enuf 8/

cc @moderation

@phlax phlax merged commit f7a8db0 into envoyproxy:main Jun 22, 2022
@K-Prabha
Copy link

🎉 Thank you @AlexanderEllis for the persistence and thank you community for helping this push thru. Much appreciated

@phlax
Copy link
Member

phlax commented Jun 22, 2022

eek @AlexanderEllis looks like i broke main landing this - i reckon minor hiccup - any ideas on how we can fix forward ?

unknown file: Failure
C++ exception with description "Protobuf message (type opentelemetry.proto.collector.logs.v1.ExportLogsServiceRequest reason INVALID_ARGUMENT:(resource_logs.instrumentation_library_logs[0]) logs: Cannot find field.) has unknown fields" thrown in the test body.
test/extensions/access_loggers/open_telemetry/grpc_access_log_impl_test.cc:51: Failure
Actual function call count doesn't match EXPECT_CALL(*async_client, startRaw(_, _, _, _))...
         Expected: to be called once
           Actual: never called - unsatisfied and active

i guess some change in config

https://dev.azure.com/cncf/envoy/_build/results?buildId=111237&view=logs&j=8c169225-0ae8-53bd-947f-07cb81846cb5&t=6b0ace90-28b2-51d9-2951-b1fd35e67dec&l=1233

phlax added a commit to phlax/envoy that referenced this pull request Jun 22, 2022
…OTLP traces (envoyproxy#20281)"

This reverts commit f7a8db0.

Signed-off-by: Ryan Northey <ryan@synca.io>
@AlexanderEllis
Copy link
Contributor Author

AlexanderEllis commented Jun 22, 2022

Ah thanks for calling that out @phlax . I see there's a PR to revert already, which SGTM.

As I mentioned there, I believe I have a quick forward fix in #21842 if that's easier — looks like it was a new test (as fresh as this morning, heh) with a field name that the updated OTel library had changed (and I had updated the other tests in this PR).

Also happy to wait and revert the revert if you want too 👍

@phlax
Copy link
Member

phlax commented Jun 22, 2022

ill let you and @ggreenway figure it out - defo preferable to fix forward imho - opened the revert as landing broke main

@moderation
Copy link
Contributor

@AlexanderEllis fantastic to see this land. I've been doing some testing sending from a bunch of Envoy's to Uptrace. On the OpenTelemetry Collector I have debug logging installed and it doesn't look like the traces are carrying any information. Similarly on the Uptrace side the spans are empty.

Resource SchemaURL: 
ScopeSpans #0
ScopeSpans SchemaURL: 
InstrumentationScope  
Span #0
    Trace ID       : bec0620d406d5a2c63bede10ee2e29d6
    Parent ID      : 0cc2228c878e7a98
    ID             : d29c47da16a3c7f4
    Name           : router sync_service egress
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2022-06-24 16:18:05.707695875 +0000 UTC
    End time       : 2022-06-24 16:18:06.370910412 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 
Span #1
    Trace ID       : bec0620d406d5a2c63bede10ee2e29d6
    Parent ID      : 
    ID             : 0cc2228c878e7a98
    Name           : egress xxxxx.xxxx.xx:9006
    Kind           : SPAN_KIND_CLIENT
    Start time     : 2022-06-24 16:18:05.707584509 +0000 UTC
    End time       : 2022-06-24 16:18:06.370936071 +0000 UTC
    Status code    : STATUS_CODE_UNSET
    Status message : 

In contrast my old Lightstep setup carried a bunch of info
image

Any idea on what is happening? Am I missing a config?

@AlexanderEllis
Copy link
Contributor Author

AlexanderEllis commented Jun 24, 2022

Nothing wrong on your end! I'll be following this up with a few more PRs to flesh out the tracer, one of which will be the attribute setting correctly in the span (stubbed out span.setTag for now, but adding that should allow these attributes to be set correctly in http_tracer_impl.cc). I'll also add an example setup with the collector and a few docs as well.

Should hopefully be smooth(er) sailing now that the main PR is merged :)

Edit: for posterity and a PSA, I fleshed out a few of the TODOs in #9958 (comment)

Amila-Rukshan pushed a commit to Amila-Rukshan/envoy that referenced this pull request Jun 28, 2022
…ces (envoyproxy#20281)

Signed-off-by: Alex Ellis <ellisonjtk@gmail.com>
Signed-off-by: Amila Senadheera <amila.15@cse.mrt.ac.lk>
@AlexanderEllis AlexanderEllis deleted the add-opentelemetry-tracer branch July 7, 2022 01:01
@nareddyt
Copy link
Contributor

nareddyt commented Dec 14, 2022

https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/trace/v3/opentelemetry.proto mentions the following

This extension is work-in-progress. Functionality is incomplete and it is not intended for production use. This extension has an unknown security posture and should only be used in deployments where both the downstream and upstream are trusted.

Are there any tangible steps to get this extension ready for production use? Is it a matter of adhering to Envoy's extension guidelines, or does it also involve changes to the OpenTelemetry libraries?

https://github.com/envoyproxy/envoy/blob/main/EXTENSION_POLICY.md#extension-stability-and-security-posture

@htuch
Copy link
Member

htuch commented Jan 6, 2023

@itamarkam @yanavlasov can you address #20281 (comment)? Thanks.

// Create an Tracers::OpenTelemetry::Span class that will contain the OTel span.
Span new_span = Span(config, operation_name, start_time, time_source_, *this);
new_span.setSampled(tracing_decision.traced);
uint64_t trace_id_high = random_.random();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to make it "less random"? i.e.
hex timestamp (32b) + hex ms (16b) + hex instance's IP (32b) + some internal counter?

@itamarkam
Copy link
Contributor

itamark

@AlexanderEllis Any thoughts here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.