Fix flaky metrics 7617#7720
Conversation
148ebf0 to
ccd62cc
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7720 +/- ##
=======================================
Coverage 95.48% 95.48%
=======================================
Files 316 316
Lines 16732 16732
=======================================
Hits 15977 15977
Misses 590 590
Partials 165 165
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Metrics Comparison Summary❌ ERROR: No summary files were generated. Expected at least 8 diff files from CI. This indicates a failure in the E2E test execution or metrics collection process. |
64b510e to
a2b20b2
Compare
Signed-off-by: Chinmay Mehrotra <mehrotrachinmay6@gmail.com>
a2b20b2 to
3aadd9c
Compare
|
can you please review the PR ? |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time. |
Signed-off-by: Chinmay Mehrotra <mehrotrachinmay4@gmail.com>
300ac34 to
b68c3a7
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes flaky metrics comparison in CI by normalizing transient labels that change between test runs. The issue was caused by randomized namespaces in e2e tests and OpenTelemetry version labels appearing in metric labels, causing spurious differences in metric comparisons.
Changes:
- Added GLOBAL transient label patterns to normalize
otel_scope_version,k8s_namespace_name, andnamespacelabels across all metrics - Modified
suppress_transient_labelsfunction to apply GLOBAL patterns to all metrics regardless of metric name - Removed outdated example comments from the configuration
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Chinmay Mehrotra <88617477+chinmay3012@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'pattern': r'.*', | ||
| 'replacement': 'version' | ||
| }, | ||
| 'k8s_namespace_name': { | ||
| 'pattern': r'.*', | ||
| 'replacement': 'namespace' | ||
| }, | ||
| 'namespace': { | ||
| 'pattern': r'.*', |
There was a problem hiding this comment.
Using regex pattern .* with re.sub will also match the empty string at the end, producing duplicated replacements (e.g., versionversion / namespacenamespace). Use an anchored pattern like ^.*$, a .+ pattern, or pass count=1 to re.sub so the label is normalized to a single fixed value as intended.
| 'pattern': r'.*', | |
| 'replacement': 'version' | |
| }, | |
| 'k8s_namespace_name': { | |
| 'pattern': r'.*', | |
| 'replacement': 'namespace' | |
| }, | |
| 'namespace': { | |
| 'pattern': r'.*', | |
| 'pattern': r'^.*$', | |
| 'replacement': 'version' | |
| }, | |
| 'k8s_namespace_name': { | |
| 'pattern': r'^.*$', | |
| 'replacement': 'namespace' | |
| }, | |
| 'namespace': { | |
| 'pattern': r'^.*$', |
|
@yurishkuro responded to the comments , open to review for further any changes or clarification . |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. You may re-open it if you need more time. |
Which problem is this PR solving?
Resolves #7617
Description of the changes
Add global suppression for otel_scope_version (normalizing it to fixed string "version") to prevent spurious diffs when OpenTelemetry dependencies are upgraded.
Add global suppression for namespace and k8s_namespace_name (normalizing to "namespace") to handle randomized namespaces in e2e tests.
Fix logic in suppress_transient_labels to correctly apply these global patterns
How was this change tested?
Manually created checking script with dummy metric files containing different otel_scope_version values (e.g., 0.63.0 vs 0.64.0) and confirmed they are now reported as identical.
Verified that files with actual differences (e.g. different metric values or label keys) are still correctly flagged as different.
Verified that randomized namespace labels are correctly normalized and ignored in comparisons.