Skip to content
Merged
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package datadog.trace.bootstrap.instrumentation.decorator;

import static datadog.trace.api.cache.RadixTreeCache.PORTS;
import static datadog.trace.api.cache.RadixTreeCache.UNSET_PORT;
import static datadog.trace.bootstrap.instrumentation.java.net.HostNameResolver.hostName;

Expand Down Expand Up @@ -153,9 +152,8 @@ public AgentSpan setPeerPort(AgentSpan span, String port) {

public AgentSpan setPeerPort(AgentSpan span, int port) {
if (port > UNSET_PORT) {
span.setTag(Tags.PEER_PORT, PORTS.get(port));
span.setTag(Tags.PEER_PORT, port);
}

return span;
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
package datadog.trace.instrumentation.mongo;

import static datadog.trace.api.Functions.UTF8_ENCODE;
import static datadog.trace.api.cache.RadixTreeCache.PORTS;
import static datadog.trace.bootstrap.instrumentation.api.AgentTracer.activateSpan;
import static datadog.trace.bootstrap.instrumentation.api.AgentTracer.activeSpan;
import static datadog.trace.bootstrap.instrumentation.api.AgentTracer.closeActive;
Expand Down Expand Up @@ -151,7 +150,7 @@ public void commandStarted(final CommandStartedEvent event) {
// may do a DNS lookup
ServerAddress serverAddress = event.getConnectionDescription().getServerAddress();
span.setTag(Tags.PEER_HOSTNAME, serverAddress.getHost())
.setTag(Tags.PEER_PORT, PORTS.get(serverAddress.getPort()))
.setTag(Tags.PEER_PORT, serverAddress.getPort())
.setTag(
Tags.DB_OPERATION,
COMMAND_NAMES.computeIfAbsent(event.getCommandName(), UTF8_ENCODE));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1092,14 +1092,24 @@ public void processTagsAndBaggage(
samplingPriority != PrioritySampling.UNSET ? samplingPriority : getSamplingPriority(),
measured,
topLevel,
httpStatusCode == 0 ? null : HTTP_STATUSES.get(httpStatusCode),
httpStatusCode == 0 ? null : shortStatusCodeToString(httpStatusCode),
Comment thread
mcculls marked this conversation as resolved.
// Get origin from rootSpan.context
getOrigin(),
longRunningVersion,
ProcessTags.getTagsForSerialization()));
}
}

private UTF8BytesString shortStatusCodeToString(short httpStatusCode) {
try {
return HTTP_STATUSES.get(httpStatusCode);
} catch (Throwable t) {
// RadixTreeCache seems to have issues on semeru11 - NPE on AtomicReferenceArray cas
// skip the cache in those cases and just create a String
return UTF8BytesString.create(Short.toString(httpStatusCode));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Will this create too many allocations.

Copy link
Copy Markdown
Contributor Author

@amarziali amarziali Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not supposed to fail the cache acccess. Note that this method is on the hot path when finish is called on the root span -> write is called. If this throw we might bubble the application code IMO. Better to allocate something than throw

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the failure constant on Semeru? Or is it sporadic?
If it is constant, we should probably detect the failure and then just bypass the cache there after.

I've started debating using UTF8BytesString for values, since we now have the UTF8 cache. I also have some evidence that some caches are doing more harm than good now.

Long term, I'm hoping that TagMap will provide a more elegant solution to these problems, but unfortunately, I'm not there yet.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well I think more that there are no failures in the real life but only happens in CI. The stacktrace is very weird since it sounds like implying that either unsafe is null (sounds impossible) either the array is itself is null (that sounds not possible as well unless the class is unsafely allocated). In CI it passes on a retry run on a fresh JVM that looks indicating that something is gets corrupted on the used jvm? I put this catch in a defensive way - I can remove if needed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How many semeru 11 hosts can be impacted ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps access to Unsafe was blocked, that could explain how the reference to it ended up being null.

But we can just solve this a different way, I'm experimenting with TagMap.Entry cache which in theory could eliminate the need for these other caches.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NPE comes from a VarHandle used to access the array backing the AtomicReferenceArray. The array is assigned in the AtomicReferenceArray constructor to a final field, so this could be a J9 JIT issue.

Given this causes sporadic CI failures in various tests I'm ok with the above workaround until we refactor this to use a TagMap.Entry cache or similar.

}
}

void injectW3CBaggageTags(Map<String, String> baggageItemsWithPropagationTags) {
List<String> baggageTagKeys = Config.get().getTraceBaggageTagKeys();
if (baggageTagKeys.isEmpty()) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,6 @@ public final class RadixTreeCache<T> {
16, 32, TO_STRING, 200, 201, 301, 307, 400, 401, 403, 404, 500, 502, 503);

public static final int UNSET_PORT = 0;
// should cover range [0, 2^16)
public static final RadixTreeCache<Integer> PORTS =
new RadixTreeCache<>(256, 256, Integer::valueOf, 80, 443, 8080);

private final int level1;
private final int level2;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,13 +118,6 @@ class RadixTreeCacheTest extends DDSpecification {
})
}

def "cache ports"() {
expect:
Integer.valueOf(port) == RadixTreeCache.PORTS.get(port)
where:
port << [0, 80, 443, 4444, 8080, 65535]
}

def "cache HTTP statuses"() {
expect:
Integer.toString(status) == RadixTreeCache.HTTP_STATUSES.get(status) as String
Expand Down