feat: add initial opentelemetry tracing to big query HTTP requests#4126
feat: add initial opentelemetry tracing to big query HTTP requests#4126
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the BigQuery client library by adding initial OpenTelemetry tracing capabilities to its HTTP requests. The changes enable the client to automatically generate and enrich OpenTelemetry spans for each HTTP call, providing deeper insights into the performance and behavior of interactions with the BigQuery service. This feature is conditionally applied, ensuring that tracing is only active when explicitly enabled. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces OpenTelemetry tracing for BigQuery HTTP requests, enhancing observability. However, the current implementation has security concerns, as it leaks potentially sensitive information (PII, identifiers, and query details) into telemetry span attributes via full URLs and raw error messages. Furthermore, a critical bug allows spans to be ended multiple times, potentially corrupting telemetry data, and a separate issue in HttpBigQueryRpc.java leads to attribute loss. Addressing these security vulnerabilities and bugs, along with improving adherence to OpenTelemetry semantic conventions and completing TODO items, is crucial for accurate and secure telemetry.
...d-bigquery/src/main/java/com/google/cloud/bigquery/spi/v2/HttpTracingRequestInitializer.java
Outdated
Show resolved
Hide resolved
...d-bigquery/src/main/java/com/google/cloud/bigquery/spi/v2/HttpTracingRequestInitializer.java
Outdated
Show resolved
Hide resolved
| public static final AttributeKey<String> HTTP_REQUEST_METHOD = | ||
| AttributeKey.stringKey("http.request.method"); | ||
| public static final AttributeKey<String> URL_FULL = AttributeKey.stringKey("url.full"); | ||
| public static final AttributeKey<String> URL_TEMPLATE = AttributeKey.stringKey("url.template"); | ||
| public static final AttributeKey<String> URL_DOMAIN = AttributeKey.stringKey("url.domain"); | ||
| public static final AttributeKey<Long> HTTP_RESPONSE_STATUS_CODE = | ||
| AttributeKey.longKey("http.response.status_code"); | ||
| public static final AttributeKey<Long> HTTP_REQUEST_RESEND_COUNT = | ||
| AttributeKey.longKey("http.request.resend_count"); | ||
| public static final AttributeKey<Long> HTTP_REQUEST_BODY_SIZE = | ||
| AttributeKey.longKey("http.request.body.size"); | ||
| public static final AttributeKey<Long> HTTP_RESPONSE_BODY_SIZE = | ||
| AttributeKey.longKey("http.response.body.size"); |
There was a problem hiding this comment.
The OpenTelemetry attribute keys being defined here are based on an older version of the semantic conventions. The conventions have been stabilized and it's recommended to use the new key names to ensure compatibility with standard OpenTelemetry tooling.
For example:
http.request.methodshould behttp.methodhttp.response.status_codeshould behttp.status_codehttp.request.body.sizeshould behttp.request_content_lengthhttp.response.body.sizeshould behttp.response_content_length
Consider adding a dependency on io.opentelemetry:opentelemetry-semconv and using the constants from io.opentelemetry.semconv.trace.attributes.SemanticAttributes to stay aligned with the specification. If you prefer to avoid the extra dependency, please update the string keys manually.
public static final AttributeKey<String> HTTP_REQUEST_METHOD =
AttributeKey.stringKey("http.method");
public static final AttributeKey<String> URL_FULL = AttributeKey.stringKey("url.full");
public static final AttributeKey<String> URL_TEMPLATE = AttributeKey.stringKey("url.template");
public static final AttributeKey<String> URL_DOMAIN = AttributeKey.stringKey("url.domain");
public static final AttributeKey<Long> HTTP_RESPONSE_STATUS_CODE =
AttributeKey.longKey("http.status_code");
public static final AttributeKey<Long> HTTP_REQUEST_RESEND_COUNT =
AttributeKey.longKey("http.request.resend_count");
public static final AttributeKey<Long> HTTP_REQUEST_BODY_SIZE =
AttributeKey.longKey("http.request_content_length");
public static final AttributeKey<Long> HTTP_RESPONSE_BODY_SIZE =
AttributeKey.longKey("http.response_content_length");There was a problem hiding this comment.
FYI I doubled, checked and it looks like what we're using is correct:
https://opentelemetry.io/docs/specs/semconv/http/http-spans/
...d-bigquery/src/main/java/com/google/cloud/bigquery/spi/v2/HttpTracingRequestInitializer.java
Outdated
Show resolved
Hide resolved
...loud-bigquery/src/main/java/com/google/cloud/bigquery/telemetry/BigQueryTelemetryTracer.java
Show resolved
Hide resolved
|
|
||
| // Common GCP Attributes | ||
| public static final AttributeKey<String> GCP_CLIENT_SERVICE = | ||
| AttributeKey.stringKey("gcp.client.service"); |
There was a problem hiding this comment.
It's OK to hardcode these keys in Bigquery for now, we may want to use them from Gax once they are stabilized.
| * HttpRequestInitializer that wraps a delegate initializer, intercepts all HTTP requests, adds | ||
| * OpenTelemetry tracing and then invokes delegate interceptor. | ||
| */ | ||
| @InternalApi |
There was a problem hiding this comment.
Let's mark all new classes as @BetaApi so we can easily change them before GA.
|
|
||
| String resolvedBigQueryRootUrl = options.getResolvedApiaryHost("bigquery"); | ||
|
|
||
| if (options.isOpenTelemetryTracingEnabled() && options.getOpenTelemetryTracer() != null) { |
There was a problem hiding this comment.
I think we need to put more thoughts about how to enable this feature.
This seems to be reusing the existing options which may or may not be a good practice. Because 1. Existing customers will also automatically get new Spans which they may not want. 2. Customers will get two different Spans which are not related at this moment.
Can we understand more about the use cases of the existing Spans?
There was a problem hiding this comment.
+1 to Blake's concerns. I'm leaning against reusing the existing options for the reasons he listed. Would it be possible to have something like isHttpTelemetryTracingEnabled() to focus on the network-level tracing?
| } | ||
|
|
||
| @VisibleForTesting | ||
| static String sanitizeUrlFull(String url) { |
There was a problem hiding this comment.
This seems a lot of logic and could easily make mistakes. Did Rust do similar things?
There was a problem hiding this comment.
this is Rusts's logic, so in function I would say is comparable. I will try to come up with a simpler/idiomatic way to scrub the query params
There was a problem hiding this comment.
can you use (a clone of) request.getUrl() instead of string manip?
| } | ||
|
|
||
| private static void addExceptionToSpan(IOException e, Span span) { | ||
| span.recordException(e); |
There was a problem hiding this comment.
What exactly does recordException do? Does it automatically creates a few attributes? It seems like a good practice, but I want to make sure it does not conflict with existing attributes.
Same question for setStatus below.
There was a problem hiding this comment.
recordException adds an event type "exception" to a span (does not affect attributes). See screenshot.
For setStatus, it sets the overall status of the span (also not an attribute), default is Unset. See screenshot.
Let me know if you want me to flag either of these for confirmation from Wes.
westarle
left a comment
There was a problem hiding this comment.
Just a couple q's about lifecycle of initializers; I had imagined a more "direct" instrumentation around execute().
| addExceptionToSpan(e, span); | ||
| throw e; | ||
| } finally { | ||
| span.end(); |
There was a problem hiding this comment.
can this happen twice if the request is retried?
|
|
||
| Span span = createHttpTraceSpan(httpMethod, url, host, port); | ||
|
|
||
| HttpResponseInterceptor originalInterceptor = request.getResponseInterceptor(); |
There was a problem hiding this comment.
are exactly one of these guaranteed to run to close the span?
| } | ||
|
|
||
| @VisibleForTesting | ||
| static String sanitizeUrlFull(String url) { |
There was a problem hiding this comment.
can you use (a clone of) request.getUrl() instead of string manip?
| addExceptionToSpan(e, span); | ||
| throw e; | ||
| } finally { | ||
| span.end(); |
There was a problem hiding this comment.
Will this result in ending the span before a retry can occur?
| String host = request.getUrl().getHost(); | ||
| Integer port = request.getUrl().getPort(); | ||
|
|
||
| Span span = createHttpTraceSpan(httpMethod, url, host, port); |
There was a problem hiding this comment.
Since the span is started here during initialize(), what happens if an exception occurs in the client before the request actually executes? Is there a risk that the response handlers are never reached and span.end() is never called, leading to a span leak?
| } catch (Exception ex) { | ||
| // Ignore | ||
| } | ||
| span.setAttribute(BigQueryTelemetryTracer.STATUS_MESSAGE, errorMessage); |
There was a problem hiding this comment.
Are there any data privacy concerns with exporting the error message? I assume BigQuery error messages could potentially contain sensitive details like SQL query snippets, table names, or row data.
|
|
||
| String resolvedBigQueryRootUrl = options.getResolvedApiaryHost("bigquery"); | ||
|
|
||
| if (options.isOpenTelemetryTracingEnabled() && options.getOpenTelemetryTracer() != null) { |
There was a problem hiding this comment.
+1 to Blake's concerns. I'm leaning against reusing the existing options for the reasons he listed. Would it be possible to have something like isHttpTelemetryTracingEnabled() to focus on the network-level tracing?
This feature adds the ability to enable open telemetry tracing on all HTTP requests.
It reuses the existing client options setting enableOpenTelemetryTracing to enable it, and then wraps the existing HttpRequestInitializer to intercept and add tracing.
This PR only contains the initial basic general/http attributes. Separate PRs will contain additional attributes.
Tested via sample test program and validated attributes show up in cloud trace:
https://screenshot.googleplex.com/AQJp4Nbb6oVbgAk