Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark#48557
Draft
jeet1995 wants to merge 2 commits intoAzure:mainfrom
Draft
Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark#48557jeet1995 wants to merge 2 commits intoAzure:mainfrom
jeet1995 wants to merge 2 commits intoAzure:mainfrom
Conversation
…s-benchmark Add per-request E2E timeout policy and availability strategy configuration to the benchmark module. The policy is applied on Cosmos(Item|Query)RequestOptions, not at the client builder level, to avoid impacting metadata calls during startup. Changes: - TenantWorkloadConfig: Add endToEndTimeoutMs and availabilityStrategyEnabled JSON config fields with getters and applyField support - AsyncBenchmark: Build CosmosEndToEndOperationLatencyPolicyConfig once in the base class constructor, expose as e2ePolicyConfig for subclasses - AsyncReadBenchmark: Create CosmosItemRequestOptions with E2E policy, pass to readItem() (was using the 3-arg overload without options) - AsyncWriteBenchmark: Create CosmosItemRequestOptions with E2E policy, replace null options in createItem() - AsyncQueryBenchmark: Set E2E policy on CosmosQueryRequestOptions immediately after construction (applies to all query operation types) Usage in workload config JSON: "endToEndTimeoutMs": 3000, "availabilityStrategyEnabled": true Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1f8d550 to
821e851
Compare
…DX schema DistributionSummary metrics (payload sizes, request sizes) were writing fields named Mean/Max/P50/P90/P95/P99 but the ADX PerfTimeSeries table expects MeanMs/MaxMs/P50Ms/P90Ms/P95Ms/P99Ms. Also adds Value field using totalAmount() so Grafana dashboard can display bytes/s. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark
Motivation
Walmart reported that
CosmosEndToEndOperationLatencyPolicyConfigwas not being adhered to during Gateway V2 benchmarking under constrained compute (1-core/2GB, 60K-80K OPM). Our theory is that CPU/memory pressure on small VMs causes the Reactor timeout operator to not fire within the configured E2E budget.To validate this theory, we need the benchmark module to support configuring E2E timeout policy + availability strategy on a per-request basis, so we can run soak tests (8 hours) and observe timeout behavior under sustained load.
What Changed
Per-request E2E timeout policy — applied on
Cosmos(Item|Query)RequestOptions, not at the client builder level. This is intentional: client-level E2E policy would apply to metadata calls (database/container reads at startup), which could cause failures during benchmark initialization.TenantWorkloadConfig.javaendToEndTimeoutMs(Integer) andavailabilityStrategyEnabled(Boolean) JSON config fields, getters, andapplyFieldsupportAsyncBenchmark.javaCosmosEndToEndOperationLatencyPolicyConfigonce in constructor, exposes ase2ePolicyConfigfor subclassesAsyncReadBenchmark.javaCosmosItemRequestOptionswith E2E policy, passes 4-argreadItem(id, pk, options, type)(was using 3-arg overload)AsyncWriteBenchmark.javaCosmosItemRequestOptionswith E2E policy, replacesnulloptions increateItem()AsyncQueryBenchmark.javaCosmosQueryRequestOptionsafter construction — applies to all query operation typesUsage
In the workload config JSON (
tenantDefaultsor per-tenant):{ "endToEndTimeoutMs": 3000, "availabilityStrategyEnabled": true }When
endToEndTimeoutMsis not set (or null), no E2E policy is applied — existing behavior is unchanged.Part of
Gateway V2 Performance Benchmarking Infrastructure — this is the first of several benchmark module enhancements to support proactive perf validation matching Walmart's production configuration (3s E2E timeout, Session consistency, availability strategy enabled).