Refactor SLO workload to use environment variables by polRk · Pull Request #802 · ydb-platform/ydb-python-sdk

polRk · 2026-04-02T10:59:53Z

Migrate from CLI subcommands (table-run, topic-run) to environment-based configuration (WORKLOAD_NAME, YDB_ENDPOINT, etc.)
Simplify argument parsing: remove subcommand structure, add env var injection
Update metrics collection to always be enabled (remove otlp_endpoint checks)
Replace quantile-estimator with hdrhistogram
Update workflow matrix from include to sdk naming
Upgrade ydb-slo-action from commit hash to v2 tag
Add pyrightconfig.json for type checking
Update Dockerfile comments to reflect new env-var-based interface

Pull request type

Please check the type of change your PR introduces:

What is the current behavior?

Issue Number: N/A

What is the new behavior?

Other information

Copilot

Pull request overview

This PR refactors the Python SLO workload runner under tests/slo/ to use environment-variable-driven configuration (workload selection, YDB connection, duration, OTLP exporter settings) instead of CLI subcommands, and updates metrics and CI workflows accordingly.

Changes:

Replace subcommand-based orchestration with a single “run all” flow selected via WORKLOAD_NAME / env vars.
Rework metrics initialization to use standard OpenTelemetry env vars and switch latency quantiles to HDR histogram-based calculation.
Update SLO GitHub workflows and Docker image interface to match the new env-var-based runner.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
tests/slo/src/runners/topic_runner.py	Update metrics factory call to new signature.
tests/slo/src/runners/table_runner.py	Update metrics factory call to new signature.
tests/slo/src/root_runner.py	Replace subcommand runner with `run_all()` flow selected via `WORKLOAD_NAME`.
tests/slo/src/pyrightconfig.json	Add Pyright configuration for the SLO runner code.
tests/slo/src/options.py	Simplify CLI flags; inject connection/duration from environment variables.
tests/slo/src/jobs/table_jobs.py	Switch to `WORKLOAD_NAME` constant.
tests/slo/src/jobs/base.py	Always start metrics sender job (no CLI OTLP gating).
tests/slo/src/jobs/async_topic_jobs.py	Always start async metrics sender task (no CLI OTLP gating).
tests/slo/src/core/metrics.py	Move to standard OTel env vars and implement HDR-based latency quantiles.
tests/slo/src/main.py	Switch entrypoint to call `run_all()`.
tests/slo/requirements.txt	Replace `quantile-estimator` with `hdrhistogram`.
tests/slo/Dockerfile	Update documentation/comments and provide a default CMD for tuning flags.
pyproject.toml	Configure `tool.ty` extra paths for SLO sources.
.github/workflows/slo.yml	Refactor workload matrix and switch to `ydb-slo-action` v2 init/report.
.github/workflows/slo-report.yml	Switch report action reference to v2 and remove label-removal job.

Comments suppressed due to low confidence (1)

.github/workflows/slo.yml:53

concurrency.group references matrix.workload, but the matrix is now defined under matrix.sdk (with name/command). This will make the workflow fail to evaluate at runtime. Update the concurrency group expression to use the new matrix key (e.g., matrix.sdk.name).

    concurrency:
      group: slo-${{ github.ref }}-${{ matrix.workload }}
      cancel-in-progress: true

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

vgvoleg

In this PR you've deleted async jobs, which is out of scope; you've reorganized arguments, which is out of scope; you've deleted an ability to run jobs without monitoring job; you've added configs for tools we are not using...
Please make PR as simple as it needed to solve your task.

Copilot

Pull request overview

Copilot reviewed 19 out of 19 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+def create_metrics(args) -> BaseMetrics:
    """
    Factory used by SLO runners.

-    Metrics are enabled if either:
-    - OTLP_ENDPOINT env var is set, or
-    - `--otlp-endpoint` is provided (and non-empty)
-
-    If endpoint is empty, metrics are disabled (DummyMetrics).
+    Uses args.otlp_endpoint, args.workload_name, args.workload_ref from parsed CLI arguments.
+    If the endpoint is empty, returns a no-op DummyMetrics.
    """
-    endpoint = (environ.get("OTLP_ENDPOINT") or (otlp_endpoint or "")).strip()
+    endpoint = (args.otlp_endpoint or "").strip()
+
    if not endpoint:
-        logger.info("Creating dummy metrics (metrics disabled)")
+        logger.info("OTLP endpoint not configured — metrics disabled")
        return DummyMetrics()

-    logger.info("Creating OTLP metrics exporter to Prometheus: %s", endpoint)
+    environ.setdefault("OTEL_EXPORTER_OTLP_PROTOCOL", "http/protobuf")
+    environ["OTEL_EXPORTER_OTLP_ENDPOINT"] = endpoint
+
+    logger.info("Creating OTLP metrics exporter (endpoint: %s)", endpoint)


+        driver.stop(timeout=args.shutdown_time)
+
+


-        default="http://localhost:9090/api/v1/otlp/v1/metrics",
-        type=str,
-        help="Full Prometheus OTLP metrics endpoint (e.g. http://ydb-prometheus:9090/api/v1/otlp/v1/metrics). Empty to disable.",
+    Every flag supports a fallback chain: CLI arg > environment variable > hardcoded default.


-      - if: always()
-        name: Upload logs
-        uses: actions/upload-artifact@v4
+          workload_baseline_image: ydb-app-current


+FROM python:3.11-slim

 ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1

-WORKDIR /src
-COPY . /src
-
-# Install runtime deps into an isolated venv so we can copy it into the final stage.
-RUN python -m venv /opt/venv \
-    && /opt/venv/bin/python -m pip install --no-cache-dir --upgrade pip \
-    && /opt/venv/bin/pip install --no-cache-dir . \
-    && /opt/venv/bin/pip install --no-cache-dir -r tests/slo/requirements.txt
+RUN apt-get update && apt-get install -y --no-install-recommends gcc libc6-dev && rm -rf /var/lib/apt/lists/*

+WORKDIR /src

-FROM python:3.11-slim
+# 1. YDB SDK
+COPY setup.py pyproject.toml README.md requirements.txt ./
+COPY ydb/ ydb/
+RUN pip install --no-cache-dir .

-ENV PYTHONDONTWRITEBYTECODE=1 \
-    PYTHONUNBUFFERED=1 \
-    PATH="/opt/venv/bin:${PATH}"
+# 2. SLO deps
+COPY tests/slo/requirements.txt tests/slo/requirements.txt
+RUN pip install --no-cache-dir -r tests/slo/requirements.txt


+        # Latency gauges (fed from HDR histograms via push())
+        self._latency_p50 = self._meter.create_gauge(
+            name="sdk_operation_latency_p50_seconds",
+            unit="s",
+            description="P50 operation latency in seconds.",
        )
+        self._latency_p95 = self._meter.create_gauge(
+            name="sdk_operation_latency_p95_seconds",
+            unit="s",
+            description="P95 operation latency in seconds.",
+        )
+        self._latency_p99 = self._meter.create_gauge(
+            name="sdk_operation_latency_p99_seconds",
+            unit="s",
+            description="P99 operation latency in seconds.",
+        )


vgvoleg

поправлены не все замечания

+ перетасовка в опциях ломает топик сло на дефолтных параметрах, у нас теперь не матчатся read/write threads и topic partitions.

Switch to the new SLO Action's one-process-one-container model and push client-side latency aggregates instead of histograms. - metrics.py: compute p50/p95/p99 per push window via HdrHistogram and emit them as gauges; reset the histogram after each push so the values represent the last window only. Read ref/workload from the new WORKLOAD_REF/WORKLOAD_NAME env vars (with legacy REF/WORKLOAD fallback). Resolve the OTLP endpoint from the standard OTEL_EXPORTER_OTLP_* env vars before falling back to --otlp-endpoint. - docker-entrypoint.sh: tiny wrapper invoked by ydb-slo-action v2 that picks the *-create / *-run subcommand from WORKLOAD_NAME and forwards tuning flags from workload_current_command to *-run. - Dockerfile: install the entrypoint script; keep gcc/libc6-dev for the hdrhistogram sdist build. - slo.yml: use ydb-platform/ydb-slo-action/init@v2 with the workload matrix, fix workload_baseline_image (was pointing at current). - slo-report.yml: align with the Java SDK workflow — drop the JS-based label removal in favour of `gh pr edit --remove-label`. - requirements.txt: pin hdrhistogram for the percentile computation.

The v2 SLO action runs both current and baseline images with the same command. The baseline checkout is on main, which still has the old `python ./tests/slo/src` entrypoint that expects a subcommand as the first arg — so passing tuning flags directly (e.g. --read-rps 1000) fails with 'argument subcommand: invalid choice'. Mirror the Java SDK setup: build the baseline image with the current PR's Dockerfile + tests/slo runner against the baseline ydb library. That way the runner-side contract (entrypoint script, metrics format, env vars) is identical for both images and only the SDK under test differs. Also fix the inline report/label-removal jobs to use pull_request context (github.run_id, github.event.pull_request.number) instead of workflow_run fields that are undefined for this trigger.

github-actions · 2026-05-29T09:02:04Z

🌋 SLO Test Results

🟢 2 workload(s) tested — All thresholds passed

Commit: 18ae9eb · View run

Workload	Thresholds	Duration	Report
sync-table	🟢 OK	10m 2s	📄 Report
sync-query	🟢 OK	10m 13s	📄 Report

Generated by ydb-slo-action

Copilot AI review requested due to automatic review settings April 2, 2026 10:59

Copilot started reviewing on behalf of polRk April 2, 2026 11:01 View session

polRk added the SLO label Apr 2, 2026

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Comment thread tests/slo/src/core/metrics.py Outdated

Comment thread tests/slo/src/root_runner.py

Comment thread tests/slo/requirements.txt Outdated

Comment thread tests/slo/src/jobs/async_topic_jobs.py

github-actions Bot removed the SLO label Apr 2, 2026

polRk added the SLO label Apr 2, 2026

github-actions Bot removed the SLO label Apr 2, 2026

polRk added the SLO label Apr 2, 2026

github-actions Bot removed the SLO label Apr 2, 2026

vgvoleg requested changes Apr 3, 2026

View reviewed changes

polRk added the SLO label Apr 13, 2026

github-actions Bot removed the SLO label Apr 13, 2026

polRk requested a review from vgvoleg April 13, 2026 10:04

vgvoleg force-pushed the slo-v2 branch from c073769 to 49579c8 Compare April 22, 2026 07:44

vgvoleg added the SLO label Apr 22, 2026

github-actions Bot removed the SLO label Apr 22, 2026

vgvoleg requested changes Apr 22, 2026

View reviewed changes

polRk force-pushed the slo-v2 branch 2 times, most recently from 5f7b4b3 to e9f1f7e Compare April 25, 2026 13:19

polRk requested review from Copilot and vgvoleg April 25, 2026 13:32

Copilot started reviewing on behalf of polRk April 25, 2026 13:33 View session

Copilot AI reviewed Apr 25, 2026

View reviewed changes

vgvoleg added the SLO label Apr 27, 2026

github-actions Bot removed the SLO label Apr 27, 2026

polRk added SLO labels Apr 27, 2026

github-actions Bot removed the SLO label Apr 27, 2026

vgvoleg requested changes Apr 27, 2026

View reviewed changes

polRk force-pushed the slo-v2 branch from e8338f4 to 753a7d9 Compare May 27, 2026 11:50

polRk force-pushed the slo-v2 branch 2 times, most recently from 3ce18f6 to 2d4e027 Compare May 27, 2026 12:59

polRk requested review from Copilot and vgvoleg May 27, 2026 13:59

Copilot started reviewing on behalf of polRk May 27, 2026 13:59 View session

This comment was marked as low quality.

Sign in to view

polRk added the SLO label May 27, 2026

github-actions Bot removed the SLO label May 27, 2026

polRk added the SLO label May 28, 2026

github-actions Bot removed the SLO label May 28, 2026

polRk added the SLO label May 28, 2026

github-actions Bot removed the SLO label May 28, 2026

polRk added 2 commits May 28, 2026 22:57

inline slo post actions

8d60d4d

polRk force-pushed the slo-v2 branch from 1a683b1 to dd7fec7 Compare May 28, 2026 19:57

polRk force-pushed the slo-v2 branch from dd7fec7 to 3963d63 Compare May 28, 2026 20:16

vgvoleg added the SLO label May 29, 2026

github-actions Bot removed the SLO label May 29, 2026

vgvoleg approved these changes May 29, 2026

View reviewed changes

vgvoleg merged commit 3766f64 into main May 29, 2026
43 of 44 checks passed

vgvoleg deleted the slo-v2 branch May 29, 2026 10:33

Conversation

polRk commented Apr 2, 2026

Pull request type

What is the current behavior?

What is the new behavior?

Other information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vgvoleg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

vgvoleg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as low quality.

Uh oh!

github-actions Bot commented May 29, 2026

🌋 SLO Test Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vgvoleg left a comment •

edited

Loading