Skip to content

feat: [BE] Add pipeline run stats endpoint with caching #177

@Mbeaulne

Description

@Mbeaulne

Description

Add a new GET /api/experimental/pipeline_runs/search_stats endpoint that returns aggregate pipeline run counts grouped by status. Supports the same filter_query format used by the existing GET /api/pipeline_runs/ list endpoint, so the stats cards on the homepage stay in sync with whatever filters the user has applied.

The goal is to return counts of pipeline runs broken down by their statuses, making it easy to surface how many runs are running, succeeded, failed, etc. for a given filter. The exact semantics of what constitutes a "run status" for counting purposes is left to the implementer's discretion.

Filtering

The endpoint accepts filter_query — the same JSON predicate format already used by the list endpoint. The "me" placeholder in filter predicates is resolved server-side from the authenticated session, consistent with how the existing list endpoint handles it. No additional current_user param is needed — it is the UI's responsibility to set the filter_query appropriately.

Param Type Description
filter_query string | null JSON filter query (same format as the list endpoint)

Supported filter predicates

These are the filters currently used on the homepage:

Filter Backend key Predicate type Example
Created by system/pipeline_run.created_by value_equals {"value_equals": {"key": "system/pipeline_run.created_by", "value": "me"}}
Pipeline name system/pipeline_run.name value_contains {"value_contains": {"key": "system/pipeline_run.name", "value_substring": "training"}}
Date range system/pipeline_run.date.created_at time_range {"time_range": {"key": "system/pipeline_run.date.created_at", "start_time": "...", "end_time": "..."}}
Annotations Custom annotation keys key_exists / value_contains {"key_exists": {"key": "my_annotation"}}

Predicates are combined with "and" logic:

{
  "and": [
    { "value_equals": { "key": "system/pipeline_run.created_by", "value": "me" } },
    { "time_range": { "key": "system/pipeline_run.date.created_at", "start_time": "2026-03-01T00:00:00Z", "end_time": "2026-03-31T23:59:59Z" } }
  ]
}

Examples

  • No filters → stats for all runs in the system
  • created_by: "me" → stats for my runs
  • created_by: "me" + date range → stats for my runs in the last 7 days
  • pipeline_name contains "training" → stats for all "training" runs

Response shape

{
  "total_runs": 1250,
  "container_execution_status_stats": {
    "RUNNING": 3,
    "SUCCEEDED": 1100,
    "FAILED": 87,
    "CANCELLED": 50,
    "PENDING": 10
  },
  "cached_at": "2026-03-19T12:00:00Z"
}

Performance approach

Doing a full table scan on every request is not acceptable at scale. Some form of caching should be used — whether that is server-side in-memory caching, HTTP caching headers, or another mechanism is left to the implementer's discretion. The important thing is that repeated identical queries are not hitting the database on every request.

  • No impact on existing list queries — this is a completely separate endpoint
  • The existing filter_query_sql.py already builds WHERE clauses from filter predicates — reuse those same clauses for the stats query

Relevant files

  • api_server_sql.py — add new stats() method
  • api_router.py — register route
  • filter_query_sql.py — reuse existing WHERE clause builder for the stats query
  • filter_query_models.py — existing filter predicate models (no changes needed)
  • backend_types_sql.py — existing PipelineRun model, existing indexes support this query

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions