Skip to content

Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark#48557

Draft
jeet1995 wants to merge 2 commits intoAzure:mainfrom
jeet1995:users/abhmohanty/gw2-perf-benchmark
Draft

Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark#48557
jeet1995 wants to merge 2 commits intoAzure:mainfrom
jeet1995:users/abhmohanty/gw2-perf-benchmark

Conversation

@jeet1995
Copy link
Copy Markdown
Member

Add CosmosEndToEndOperationLatencyPolicyConfig support to azure-cosmos-benchmark

Motivation

Walmart reported that CosmosEndToEndOperationLatencyPolicyConfig was not being adhered to during Gateway V2 benchmarking under constrained compute (1-core/2GB, 60K-80K OPM). Our theory is that CPU/memory pressure on small VMs causes the Reactor timeout operator to not fire within the configured E2E budget.

To validate this theory, we need the benchmark module to support configuring E2E timeout policy + availability strategy on a per-request basis, so we can run soak tests (8 hours) and observe timeout behavior under sustained load.

What Changed

Per-request E2E timeout policy — applied on Cosmos(Item|Query)RequestOptions, not at the client builder level. This is intentional: client-level E2E policy would apply to metadata calls (database/container reads at startup), which could cause failures during benchmark initialization.

File Change
TenantWorkloadConfig.java Added endToEndTimeoutMs (Integer) and availabilityStrategyEnabled (Boolean) JSON config fields, getters, and applyField support
AsyncBenchmark.java Base class builds CosmosEndToEndOperationLatencyPolicyConfig once in constructor, exposes as e2ePolicyConfig for subclasses
AsyncReadBenchmark.java Creates CosmosItemRequestOptions with E2E policy, passes 4-arg readItem(id, pk, options, type) (was using 3-arg overload)
AsyncWriteBenchmark.java Creates CosmosItemRequestOptions with E2E policy, replaces null options in createItem()
AsyncQueryBenchmark.java Sets E2E policy on CosmosQueryRequestOptions after construction — applies to all query operation types

Usage

In the workload config JSON (tenantDefaults or per-tenant):

{
  "endToEndTimeoutMs": 3000,
  "availabilityStrategyEnabled": true
}

When endToEndTimeoutMs is not set (or null), no E2E policy is applied — existing behavior is unchanged.

Part of

Gateway V2 Performance Benchmarking Infrastructure — this is the first of several benchmark module enhancements to support proactive perf validation matching Walmart's production configuration (3s E2E timeout, Session consistency, availability strategy enabled).

…s-benchmark

Add per-request E2E timeout policy and availability strategy configuration
to the benchmark module. The policy is applied on Cosmos(Item|Query)RequestOptions,
not at the client builder level, to avoid impacting metadata calls during startup.

Changes:
- TenantWorkloadConfig: Add endToEndTimeoutMs and availabilityStrategyEnabled
  JSON config fields with getters and applyField support
- AsyncBenchmark: Build CosmosEndToEndOperationLatencyPolicyConfig once in the
  base class constructor, expose as e2ePolicyConfig for subclasses
- AsyncReadBenchmark: Create CosmosItemRequestOptions with E2E policy, pass
  to readItem() (was using the 3-arg overload without options)
- AsyncWriteBenchmark: Create CosmosItemRequestOptions with E2E policy, replace
  null options in createItem()
- AsyncQueryBenchmark: Set E2E policy on CosmosQueryRequestOptions immediately
  after construction (applies to all query operation types)

Usage in workload config JSON:
  "endToEndTimeoutMs": 3000,
  "availabilityStrategyEnabled": true

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jeet1995 jeet1995 force-pushed the users/abhmohanty/gw2-perf-benchmark branch from 1f8d550 to 821e851 Compare March 25, 2026 19:29
…DX schema

DistributionSummary metrics (payload sizes, request sizes) were writing
fields named Mean/Max/P50/P90/P95/P99 but the ADX PerfTimeSeries table
expects MeanMs/MaxMs/P50Ms/P90Ms/P95Ms/P99Ms. Also adds Value field
using totalAmount() so Grafana dashboard can display bytes/s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant