[mcp] get_span_details and get_trace_errors tools (Phase 2 Steps 5-6)#7859
[mcp] get_span_details and get_trace_errors tools (Phase 2 Steps 5-6)#7859yurishkuro merged 17 commits intomainfrom
Conversation
…s tools Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #7859 +/- ##
==========================================
+ Coverage 95.45% 95.47% +0.01%
==========================================
Files 301 303 +2
Lines 15855 16033 +178
==========================================
+ Hits 15134 15307 +173
- Misses 565 569 +4
- Partials 156 157 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…mplete in ADR Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
…ial results support, move tool registration to helper Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
|
@copilot address new review cmments |
…kups, use MergeTraces for multiple iterations Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
…rors directly, add attributesToMap helper, use hex.DecodeString, use errors.Join, add comprehensive tests Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
| return func(yield func(trace ptrace.Traces, err error) bool) { | ||
| currentTrace := ptrace.NewTraces() | ||
| currentTraceID := pcommon.NewTraceIDEmpty() | ||
| cont := true |
There was a problem hiding this comment.
if you made change to this code then presumably you encountered a condition where the code was not working. Add a test for that condition. Otherwise do not introduce changes if they are not needed.
There was a problem hiding this comment.
The existing TestGetSpanDetailsHandler_Handle_PartialResults test demonstrates why the cont flag is needed. Without it, the test fails with panic: "range function continued iteration after function for loop body returned false". This is an iterator protocol violation - after the consumer returns false from the for-range loop, we must not call yield again. The cont flag prevents calling the final yield when the consumer has already stopped. Commit: 918d844
There was a problem hiding this comment.
TestGetSpanDetailsHandler_Handle_PartialResults test is not in the jptrace package - I am asking for test in aggregator_test.go that demonstrates the need for cont variable
There was a problem hiding this comment.
Added test TestAggregateTraces_DoesNotYieldAfterConsumerStops that demonstrates the need for the cont variable. Without it, the test would panic with "runtime error: range function continued iteration after function for loop body returned false" when the final yield (line 48-50) is attempted after the consumer has already returned false. Commit: c9cd55e
…rs, remove 'metadata only' from comment, keep cont flag with test demonstrating iterator protocol violation Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
Metrics Comparison SummaryTotal changes across all snapshots: 0 Detailed changes per snapshotsummary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 0
summary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 0
summary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 0
summary_metrics_snapshot_cassandra📊 Metrics Diff SummaryTotal Changes: 0
|
|
@copilot TestGetSpanDetailsHandler_Handle_PartialResults test is not in the jptrace package - I am asking for test in aggregator_test.go that demonstrates the need for cont variable |
…tion Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com>
…aegertracing#7859) Part of jaegertracing#7827 ## Implementation Plan for Phase 2 Steps 5 and 6 - [x] Add types for `get_span_details` and `get_trace_errors` tools - [x] Implement `get_span_details` handler - [x] Implement `get_trace_errors` handler - [x] Add comprehensive tests (99.5% coverage) - [x] Register both tools in server.go - [x] Run `make fmt`, `make lint`, and `make test` - all pass ✅ - [x] Update ADR 002 document with green checkboxes - [x] Split types.go into separate files per handler - [x] Mark Phase 1 items as complete in ADR 002 - [x] Address PR feedback (Rounds 1-9) ## Summary Successfully implemented Phase 2 steps 5 and 6 from ADR 002 with all PR feedback addressed: ### Tools Implemented 1. **get_span_details**: Fetches full OTLP span details (no hard limit, recommends 20) 2. **get_trace_errors**: Retrieves all error spans from a trace ### Latest Changes (Round 9 Feedback) **Added Test for cont Variable** (Comment #2678945689): - Added `TestAggregateTraces_DoesNotYieldAfterConsumerStops` in `aggregator_test.go` - Test demonstrates the bug that the `cont` variable prevents - Without `cont` flag, attempting to call the final `yield` after consumer returns `false` causes a panic: "runtime error: range function continued iteration after function for loop body returned false" - This is an iterator protocol violation - after consumer returns `false`, we must not call `yield` again - The test creates a scenario where traces are in separate batches and consumer stops after first trace, ensuring the final yield would occur after consumer has stopped - Test verifies only one yield occurs when consumer returns false ### Test Coverage - 99.5% coverage (maintained) - All tests passing ✅ - `make fmt` passes ✅ - `make lint` passes ✅ The implementation is production-ready with comprehensive error handling, clean code organization, excellent test coverage, and proper demonstration of why the `cont` variable is necessary in AggregateTraces. <!-- START COPILOT CODING AGENT SUFFIX --> <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > implement phase 2 steps 5 and 6 from ADR 002 as part of issue jaegertracing#7827. State Part of jaegertracing#7827 in PR description. Run `make fmt`, `make lint`, and `make test` to success everytime before pushing to remove branch. Make sure new code has test coverage, no less than 95%. Update the ADR document to mark done items with green checkbox emojis. </details> <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Signed-off-by: Yuri Shkuro <github@ysh.us> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: yurishkuro <3523016+yurishkuro@users.noreply.github.com> Co-authored-by: Yuri Shkuro <github@ysh.us>
) ## Which problem is this PR solving? When an agent calls `search_traces`, the response includes `service_count` (an integer) but not the actual service names. The agent knows a trace spans 5 services but has no idea which ones. To find out, it must call `get_trace_topology` or `get_span_details` for every trace individually, which defeats the purpose of a lightweight summary endpoint. The data is already computed internally. `buildTraceSummary` builds a `services` map to derive `service_count`, then discards the map keys. This has been the case since the function was introduced in #7858, where the map was built but only `len(services)` was surfaced. Subsequent changes (#7859, #7863, #7916, #8194) restructured the types and renamed fields but never revisited the service data gap. ## Short description of the changes - Added `Services []string` to `TraceSummary`, populated from the existing `services` map - Sorted alphabetically via `slices.Sort` for deterministic output across calls - Added multi-service test with three services in non-alphabetical order, unique span IDs, and proper parent-child relationships to verify sort correctness - Updated existing summary tests to assert on the new field - `ServiceCount` preserved for backward compatibility ## Use case An agent investigating a latency spike searches for slow traces. The summary now returns: ```json { "service_count": 3, "services": ["api-gateway", "payment", "user-service"] } ``` The agent can immediately see that the payment service is involved and drill into that trace, instead of blindly fetching topology for every result. ## How was this change tested? - `go test ./cmd/jaeger/internal/extension/jaegermcp/...` - all passing - `make lint` - 0 issues - `make fmt` - clean - `make test` - 3053 tests passing ## Checklist - [x] I have read https://github.com/jaegertracing/jaeger/blob/main/CONTRIBUTING_GUIDELINES.md - [x] I have signed all commits - [x] I have added unit tests for the new functionality - [x] I have run lint and test steps successfully: `make lint test` ## AI Usage in this PR (choose one) - [x] **Light**: AI provided minor assistance (formatting, simple suggestions) Signed-off-by: Roshan Singh <roshansingh7890@gmail.com> Signed-off-by: Roshan <rosh.s568@gmail.com>
Part of #7827
Implementation Plan for Phase 2 Steps 5 and 6
get_span_detailsandget_trace_errorstoolsget_span_detailshandlerget_trace_errorshandlermake fmt,make lint, andmake test- all pass ✅Summary
Successfully implemented Phase 2 steps 5 and 6 from ADR 002 with all PR feedback addressed:
Tools Implemented
Latest Changes (Round 9 Feedback)
Added Test for cont Variable (Comment #2678945689):
TestAggregateTraces_DoesNotYieldAfterConsumerStopsinaggregator_test.gocontvariable preventscontflag, attempting to call the finalyieldafter consumer returnsfalsecauses a panic: "runtime error: range function continued iteration after function for loop body returned false"false, we must not callyieldagainTest Coverage
make fmtpasses ✅make lintpasses ✅The implementation is production-ready with comprehensive error handling, clean code organization, excellent test coverage, and proper demonstration of why the
contvariable is necessary in AggregateTraces.Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.