Skip to content

Add first-class data release semantics to the policyengine-api-v2 simulation gateway #434

@MaxGhenis

Description

@MaxGhenis

Problem

policyengine-api-v2 already has a useful model-version routing layer, but its public contract is still model-only.

Today the simulation gateway:

  • resolves country + version to a versioned Modal app
  • defaults omitted versions to a mutable latest registry entry
  • returns the resolved model version in submit responses

But it does not have first-class data-release semantics. Dataset identity is still whatever data path the caller passes through or whatever default the worker uses. That means the API can tell a client which model package version ran, but not which immutable model+data bundle actually produced the result.

Relevant code paths:

  • gateway request/response models and endpoints: projects/policyengine-api-simulation/src/modal/gateway/models.py, .../gateway/endpoints.py
  • worker execution path: projects/policyengine-api-simulation/src/modal/simulation.py
  • versioned Modal app naming/install behavior: projects/policyengine-api-simulation/src/modal/app.py
  • mutable registry updates for latest: projects/policyengine-api-simulation/src/modal/utils/update_version_registry.py
  • checked-in dependency spec is open-ended while runtime installs exact env-driven versions: projects/policyengine-api-simulation/pyproject.toml

Desired contract

The simulation API should route and report an immutable execution bundle, not just a model package version.

At minimum that bundle should include:

  • country model package version
  • country data package version
  • resolved dataset artifact or manifest revision
  • bundle or manifest identifier suitable for caching and replay

latest can remain a convenience alias for discovery, but the API response for an executed job should always contain the resolved immutable bundle.

What should change

  1. Add first-class data-release semantics to the gateway request/response contract.
  2. Resolve a full model+data bundle before execution, not just a model version.
  3. Return the resolved bundle in submit and status responses.
  4. Make registries store or resolve immutable bundle identities rather than only mutable latest model-version pointers.
  5. Keep latest as a moving alias if needed, but never let it be the only provenance attached to a completed job.
  6. Align deployment/runtime configuration so checked-in specs and runtime-installed versions tell the same bundle story.

Acceptance criteria

  • Gateway requests can specify data release identity directly or via a higher-level bundle reference.
  • Submit/status responses include the resolved model/data bundle used for execution.
  • Worker execution uses the resolved data release rather than implicit defaults.
  • Cache and dedupe semantics are bundle-aware rather than model-version-only.
  • The mutable latest registry remains optional discovery sugar and is not the only provenance available for executed simulations.
  • Deployment/runtime metadata can reconstruct the exact execution bundle for a given deployed version.

Upstream dependencies

This should consume the data-release contracts from:

And it should stay aligned with:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions