Structured Database Seeding

Initium's seed subcommand applies structured, repeatable data provisioning to your database from YAML or JSON seed spec files. It replaces ad-hoc shell scripts with a declarative approach that supports idempotency, referential integrity, and a tracking table to prevent duplicate application.

Seed spec files are MiniJinja templates: they are rendered with environment variables before parsing, enabling conditional phases, loops, and dynamic configuration.

Supported Databases

Driver	Connection URL format
`postgres`	`postgres://user:pass@host:5432/dbname`
`mysql`	`mysql://user:pass@host:3306/dbname`
`sqlite`	`/path/to/database.db` or `:memory:` for tests

Quick Start

# Apply seeds from a YAML spec file
initium seed --spec /seeds/seed.yaml

# Apply seeds with reset (delete + re-seed)
initium seed --spec /seeds/seed.yaml --reset

# With JSON log output
initium seed --spec /seeds/seed.yaml --json

Seed Spec Schema

The seed spec file defines the complete seeding plan. Both YAML and JSON formats are supported (file extension determines parser). The spec file is a MiniJinja template rendered with environment variables before parsing.

Database connection

Two connection styles are supported. Choose one — they cannot be combined.

URL-based (existing behavior):

database:
  driver: postgres
  url: "postgres://user:pass@host:5432/dbname"
  # or: url_env: DATABASE_URL

Structured fields (no URL encoding needed):

database:
  driver: postgres
  host: pg.example.com
  port: 5432 # Optional. Default: 5432 (postgres), 3306 (mysql)
  user: netbird
  password: "{{ env.DB_PASSWORD }}" # Special chars just work — no URL encoding
  name: mydb
  options: # Optional. Driver-specific connection parameters
    sslmode: disable

Structured config builds the connection using driver-native APIs, bypassing URL parsing entirely. Passwords with @, %, :, or other URL-reserved characters work without encoding.

Note: Structured config is not supported for SQLite — use url instead.

Full schema

database:
  driver: postgres # Required. One of: postgres, mysql, sqlite
  # --- URL-based connection (pick one style) ---
  url: "postgres://..." # Direct database URL
  url_env: DATABASE_URL # Or: name of env var containing the URL
  # --- Structured connection (alternative to url/url_env) ---
  host: pg.example.com # Database host
  port: 5432 # Optional. Default per driver
  user: myuser # Database user
  password: "secret" # Database password (special chars OK)
  name: mydb # Database name
  options: # Optional. Driver-specific parameters
    sslmode: disable
  default_database: postgres # Optional. Bootstrap database for create_if_missing
  # --- Common ---
  tracking_table: initium_seed # Default: "initium_seed"

phases:
  - name: setup # Required. Phase name.
    order: 1 # Optional. Execution order (default: 0).
    database: reporting # Optional. Database to target/create.
    schema: analytics # Optional. Schema to target/create.
    create_if_missing: true # Optional. Create database/schema if missing.
    timeout: 30s # Optional. Default wait timeout (e.g. 30s, 1m; default: 30s).
    wait_for: # Optional. Objects to wait for before seeding.
      - type: table # One of: table, view, schema, database.
        name: users
        timeout: 60s # Optional. Per-object timeout override.
    seed_sets: # Optional. Seed sets to apply in this phase.
      - name: initial_data
        order: 1 # Optional. Controls execution order across seed sets.
        mode: once # Optional. "once" (default) or "reconcile".
        tables:
          - table: config
            order: 1 # Optional. Controls execution order within a seed set.
            unique_key: [
              email,
            ] # Optional. Columns used for duplicate detection.
            auto_id: # Optional. Auto-generated ID configuration.
              column: id # Column name for the auto-generated ID.
              id_type: integer # ID type (default: integer).
            rows:
              - _ref: row_alias # Optional. Internal reference name for this row.
                key: app_name
                value: "{{ env.APP_NAME }}"

Field reference

Field	Type	Required	Description
`database.driver`	string	Yes	Database driver: `postgres`, `mysql`, or `sqlite`
`database.url`	string	No	Direct database connection URL (cannot combine with structured fields)
`database.url_env`	string	No	Environment variable containing the database URL
`database.host`	string	No	Database host (structured config; cannot combine with url/url_env)
`database.port`	integer	No	Database port (default: 5432 for postgres, 3306 for mysql)
`database.user`	string	No	Database user (structured config)
`database.password`	string	No	Database password — special characters work without encoding
`database.name`	string	No	Database name (structured config)
`database.options`	map[string]string	No	Driver-specific connection parameters (e.g. `sslmode: disable`)
`database.default_database`	string	No	Database to connect to during `create_if_missing` bootstrap. Default: `postgres` for PostgreSQL, none for MySQL.
`database.tracking_table`	string	No	Name of the seed tracking table (default: `initium_seed`)
`phases[].name`	string	Yes	Unique phase name
`phases[].order`	integer	No	Execution order (lower first, default: 0)
`phases[].database`	string	No	Target database name (for create/switch)
`phases[].schema`	string	No	Target schema name (for create/switch)
`phases[].create_if_missing`	boolean	No	Create the database/schema if it does not exist (default: false)
`phases[].timeout`	string	No	Default wait timeout (e.g. `30s`, `1m`, `1m30s`; default: `30s`)
`phases[].wait_for[].type`	string	Yes	Object type: `table`, `view`, `schema`, or `database`
`phases[].wait_for[].name`	string	Yes	Object name to wait for
`phases[].wait_for[].timeout`	string	No	Per-object timeout override (e.g. `60s`, `2m`, `1m30s`)
`phases[].seed_sets[].name`	string	Yes	Unique name for the seed set (used in tracking)
`phases[].seed_sets[].order`	integer	No	Execution order (lower values first, default: 0)
`phases[].seed_sets[].mode`	string	No	Seed mode: `once` (default) or `reconcile`
`phases[].seed_sets[].tables[].table`	string	Yes	Target database table name
`phases[].seed_sets[].tables[].order`	integer	No	Execution order within the seed set (default: 0)
`phases[].seed_sets[].tables[].unique_key`	string[]	No	Columns for duplicate detection
`phases[].seed_sets[].tables[].auto_id.column`	string	No	Auto-generated ID column name
`phases[].seed_sets[].tables[].auto_id.id_type`	string	No	ID type (default: `integer`)
`phases[].seed_sets[].tables[].rows[]._ref`	string	No	Internal reference name for cross-table references

Wait-for object support by driver

Object type	SQLite	PostgreSQL	MySQL
`table`	✅	✅	✅
`view`	✅	✅	✅
`schema`	❌	✅	✅*
`database`	❌	✅	✅*

* In MySQL, schema and database are synonymous.

Create-if-missing support by driver

Operation	SQLite	PostgreSQL	MySQL
`CREATE DATABASE`	❌	✅	✅
`CREATE SCHEMA`	❌	✅	✅*

* In MySQL, CREATE SCHEMA maps to CREATE DATABASE.

SQLite does not support separate databases or schemas — each file is a database.

Database connection resolution

If structured fields (host, port, user, password, name) are provided, the connection is built using driver-native APIs — no URL is needed.

Otherwise, the database URL is resolved in this order:

database.url_env — environment variable name containing the URL
database.url — direct URL in the spec file
DATABASE_URL — fallback environment variable

Structured fields and URL-based fields (url/url_env) are mutually exclusive — specifying both is a validation error.

Features

MiniJinja Templating

All seed spec files are rendered as MiniJinja templates before parsing. Environment variables are available as {{ env.VAR_NAME }}. This enables:

Dynamic values: {{ env.APP_VERSION }}
Conditional phases: {% if env.ENABLE_ANALYTICS %}...{% endif %}
Generated rows: {% for i in range(10) %}...{% endfor %}
Lenient mode: missing env vars render as empty strings (no errors)

database:
  driver: {{ env.DB_DRIVER }}
  url_env: DATABASE_URL
phases:
  - name: config
    seed_sets:
      - name: app_config
        tables:
          - table: config
            rows:
{% for key in ["app_name", "version", "env"] %}
              - key: {{ key }}
                value: "{{ env['APP_' ~ key | upper] }}"
{% endfor %}

Note: The @ref: syntax for cross-table references is processed at execution time (after template rendering), so it works seamlessly with MiniJinja templates.

Idempotency via Tracking Table

Initium creates a tracking table (default: initium_seed) that records which seed sets have been applied. On subsequent runs, already-applied seed sets are skipped automatically.

┌──────────────────────────────────────┐
│           initium_seed               │
├──────────┬───────────────────────────┤
│ seed_set │ applied_at                │
├──────────┼───────────────────────────┤
│ users    │ 2025-01-15T10:30:00Z      │
│ config   │ 2025-01-15T10:30:01Z      │
└──────────┴───────────────────────────┘

Duplicate Detection via Unique Keys

When unique_key is specified on a table, each row is checked against existing data before insertion. Rows matching the unique key are skipped, preventing duplicate inserts even within the same seed set.

tables:
  - table: users
    unique_key: [email]
    rows:
      - name: Alice
        email: alice@example.com # Skipped if email already exists

Auto-Generated IDs and Cross-Table References

Use auto_id to let the database generate IDs, and _ref + @ref: to reference generated values in other tables:

phases:
  - name: data
    seed_sets:
      - name: with_refs
        tables:
          - table: departments
            order: 1
            auto_id:
              column: id
            rows:
              - _ref: dept_eng # Name this row for later reference
                name: Engineering

          - table: employees
            order: 2
            rows:
              - name: Alice
                email: alice@example.com
                department_id: "@ref:dept_eng.id" # Resolves to the generated ID

Environment Variable Substitution

Use $env:VAR_NAME or MiniJinja {{ env.VAR_NAME }} to inject values from environment variables at runtime. This is ideal for credentials loaded from Kubernetes secrets:

rows:
  - username: "$env:ADMIN_USERNAME"
    password_hash: "{{ env.ADMIN_PASSWORD_HASH }}"

Reconcile Mode

By default, seed sets are applied once and never modified (mode: once). Reconcile mode makes seeding declarative: the rendered spec becomes the source of truth, and initium reconciles the database to match it whenever the rendered spec changes.

If the rendered spec has not changed since the last run (content hash match), initium treats the seed set as already reconciled and skips it. Out-of-band database changes are not corrected until a spec change triggers reconciliation again.

Enable reconcile mode per seed set:

seed_sets:
  - name: departments
    mode: reconcile # "once" (default) or "reconcile"
    tables:
      - table: departments
        unique_key: [name] # Required for reconcile mode
        rows:
          - name: Engineering
          - name: Sales

Or override all seed sets for a single run:

initium seed --spec /seeds/seed.yaml --reconcile-all

How it works:

On each run, initium computes a content hash of the rendered seed set (after template/env expansion).
If the hash matches the stored hash, the seed set is skipped (no-op).
If the hash differs, initium reconciles row by row:
- New rows (in spec but not in DB) are inserted.
- Changed rows (different values for same unique key) are updated.
- Removed rows (in DB but not in spec) are deleted.

Ignoring columns: Some columns should be set on initial insert but never overwritten during reconciliation (e.g., timestamps, random tokens, or values managed by database triggers). Use ignore_columns to exclude them:

tables:
  - table: users
    unique_key: [email]
    ignore_columns: [created_at, api_token]
    rows:
      - email: alice@example.com
        name: Alice
        created_at: "2026-01-01"
        api_token: "$env:ALICE_TOKEN"

Ignored columns are:

Included in the initial INSERT (the row is written with all columns).
Excluded from change detection (changing an ignored column's value in the spec does not trigger an update).
Excluded from UPDATE statements (manual or trigger-managed changes in the database are preserved).
Excluded from the content hash (so they don't affect the fast-path skip).

ignore_columns cannot overlap with unique_key.

Requirements:

Every table in a reconciled seed set must have a unique_key. Without it, there is no way to identify which rows correspond to which spec entries.
Environment variable changes trigger reconciliation (resolved values are compared, not raw templates).

Row tracking: Initium creates a companion table ({tracking_table}_rows, e.g., initium_seed_rows) that stores the resolved values of each seeded row. This enables change detection and orphan deletion.

Dry-run mode: Preview what reconciliation would do without modifying the database:

initium seed --spec /seeds/seed.yaml --dry-run

This logs insert/update/delete counts per table without executing any changes.

Reset Mode

Use --reset to delete all data from seeded tables and remove tracking entries before re-applying. Tables are deleted in reverse order to respect foreign key constraints:

initium seed --spec /seeds/seed.yaml --reset

Ordering

Both seed sets and tables within seed sets support explicit ordering via the order field. Lower values execute first (default: 0). This ensures parent tables are seeded before dependent tables.

Transaction Safety

Each seed set is applied within a database transaction. If any row fails to insert, the entire seed set is rolled back, preventing partial data application.

Kubernetes Usage

Credentials via Environment Variables (from Secrets)

apiVersion: v1
kind: Pod
spec:
  initContainers:
    - name: seed-data
      image: ghcr.io/kitstream/initium:latest
      args: ["seed", "--spec", "/seeds/seed.yaml"]
      env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: url
        - name: ADMIN_USERNAME
          valueFrom:
            secretKeyRef:
              name: admin-credentials
              key: username
        - name: ADMIN_PASSWORD_HASH
          valueFrom:
            secretKeyRef:
              name: admin-credentials
              key: password-hash
      volumeMounts:
        - name: seed-specs
          mountPath: /seeds
          readOnly: true
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        readOnlyRootFilesystem: true
        allowPrivilegeEscalation: false
        capabilities:
          drop: [ALL]
  volumes:
    - name: seed-specs
      configMap:
        name: seed-specs

CLI Reference

Flag	Default	Description
`--spec`	(required)	Path to seed spec file (YAML or JSON)
`--reset`	`false`	Delete existing data and re-apply seeds
`--dry-run`	`false`	Preview changes without modifying the database
`--reconcile-all`	`false`	Override all seed sets to reconcile mode for this run
`--json`	`false`	Enable JSON log output

Failure Modes

Scenario	Behavior
Invalid spec file	Fails with parse error before connecting to database
Invalid MiniJinja template	Fails with template syntax error before parsing YAML
Database unreachable	Fails with connection error
Unsupported driver	Fails with descriptive error listing supported drivers
Missing env var for URL	Fails with error naming the missing variable
Missing env var in `$env:`	Fails with error naming the missing variable
Unresolved `@ref:`	Fails with error naming the missing reference
Row insertion failure	Entire seed set rolled back via transaction
Duplicate row (with unique_key)	Row silently skipped
Already-applied seed set	Seed set silently skipped
Wait-for object timeout	Fails with structured timeout error naming the object
Unsupported object type for driver	Fails immediately with driver-specific error
CREATE DATABASE on SQLite	Fails with "not supported" error
CREATE SCHEMA on SQLite	Fails with "not supported" error

Examples

See the examples/seed/ directory:

basic-seed.yaml — PostgreSQL with departments and employees, cross-table references
structured-config-seed.yaml — PostgreSQL with structured connection config (no URL encoding)
sqlite-seed.yaml — SQLite configuration table seeding
env-credentials-seed.yaml — MySQL with credentials from Kubernetes secrets
phased-seed.yaml — Multi-phase PostgreSQL seeding with wait-for, create-if-missing, and MiniJinja templating

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured Database Seeding

Supported Databases

Quick Start

Seed Spec Schema

Database connection

Full schema

Field reference

Wait-for object support by driver

Create-if-missing support by driver

Database connection resolution

Features

MiniJinja Templating

Idempotency via Tracking Table

Duplicate Detection via Unique Keys

Auto-Generated IDs and Cross-Table References

Environment Variable Substitution

Reconcile Mode

Reset Mode

Ordering

Transaction Safety

Kubernetes Usage

Credentials via Environment Variables (from Secrets)

CLI Reference

Failure Modes

Examples

FilesExpand file tree

seeding.md

Latest commit

History

seeding.md

File metadata and controls

Structured Database Seeding

Supported Databases

Quick Start

Seed Spec Schema

Database connection

Full schema

Field reference

Wait-for object support by driver

Create-if-missing support by driver

Database connection resolution

Features

MiniJinja Templating

Idempotency via Tracking Table

Duplicate Detection via Unique Keys

Auto-Generated IDs and Cross-Table References

Environment Variable Substitution

Reconcile Mode

Reset Mode

Ordering

Transaction Safety

Kubernetes Usage

Credentials via Environment Variables (from Secrets)

CLI Reference

Failure Modes

Examples