apollo_integration_tests: fix flaky test due to metric registration race#14253
apollo_integration_tests: fix flaky test due to metric registration race#14253AvivYossef-starkware wants to merge 1 commit into
Conversation
PR SummaryLow Risk Overview
Reviewed by Cursor Bugbot for commit a950914. Bugbot is set up for automated code reviews on this repo. Configure here. |
dorimedini-starkware
left a comment
There was a problem hiding this comment.
@dorimedini-starkware reviewed 1 file and all commit messages, and made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on AvivYossef-starkware).
crates/apollo_integration_tests/src/monitoring_utils.rs line 221 at r1 (raw file):
"Sequencer {sequencer_idx} has {reverted_count} reverted transactions" ); }
wouldn't it be better to have a retry loop until the metric is registered? i.e. catch the specific "not registered" error and loop?
asking because the assert_no_reverted_txs can hit the same issue, but unwrap_or(0) is a bad idea there (false positive)
Code quote:
pub async fn sequencer_num_accepted_txs(monitoring_client: &MonitoringClient) -> usize {
// If the sequencer accepted txs, sync should process them and update the respective metric.
// Return 0 if the metric isn't registered yet (race with StateSyncRunner startup).
monitoring_client
.get_metric::<usize>(STATE_SYNC_PROCESSED_TRANSACTIONS.get_name())
.await
.unwrap_or(0)
}
pub async fn assert_no_reverted_txs(monitoring_client: &MonitoringClient, sequencer_idx: usize) {
let reverted_count =
monitoring_client.get_metric::<usize>(REVERTED_TRANSACTIONS.get_name()).await.unwrap();
assert_eq!(
reverted_count, 0,
"Sequencer {sequencer_idx} has {reverted_count} reverted transactions"
);
}
AvivYossef-starkware
left a comment
There was a problem hiding this comment.
@AvivYossef-starkware made 1 comment.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on claude).
crates/apollo_integration_tests/src/monitoring_utils.rs line 221 at r1 (raw file):
Previously, dorimedini-starkware wrote…
wouldn't it be better to have a retry loop until the metric is registered? i.e. catch the specific "not registered" error and loop?
asking because theassert_no_reverted_txscan hit the same issue, butunwrap_or(0)is a bad idea there (false positive)
@claude plz fix it according to Dori`s comment
ea70038 to
c1adc9b
Compare
Fix race condition where tests query metrics before StateSyncRunner registers them: - sequencer_num_accepted_txs: Return 0 for MetricNotFound errors (safe since it's used in polling loops), panic on other unexpected errors - assert_no_reverted_txs: Add retry loop until the metric is registered, avoiding false positives https://claude.ai/code/session_014UmiFHenEiE4vNnh8qbGkZ
1bbe8de to
a950914
Compare
Summary
sequencer_num_accepted_txspanics whenSTATE_SYNC_PROCESSED_TRANSACTIONSmetric isn't registered yetContext
This fixes a race condition observed in PR #14250 where the integration test failed with:
The metric is registered asynchronously when
StateSyncRunner::start()is called. The test can query the metric before registration completes. Sincesequencer_num_accepted_txsis only used inrun_untilpolling loops, returning 0 allows the test to wait for proper metric registration.Test plan
Slack thread: https://starkwareindustries.slack.com/archives/C07E54Y3W3H/p1779980981138769?thread_ts=1779979080.466729&cid=C07E54Y3W3H
https://claude.ai/code/session_014UmiFHenEiE4vNnh8qbGkZ
Generated by Claude Code