diff --git a/README.md b/README.md index 943f8ab..2996a46 100644 --- a/README.md +++ b/README.md @@ -18,6 +18,8 @@ pnpm dev:down ## Prod +Production releases and server updates are documented in [docs/deployment.md](docs/deployment.md). + Create the local production environment file once: ```bash diff --git a/docs/adr/0002-single-host-compose-image-release.md b/docs/adr/0002-single-host-compose-image-release.md index dc72b74..df90d9e 100644 --- a/docs/adr/0002-single-host-compose-image-release.md +++ b/docs/adr/0002-single-host-compose-image-release.md @@ -4,4 +4,4 @@ Central production runs on one Linux host with Docker Compose as the deployment The core production stack is Cockpit, Backend, PostgreSQL, migrations, and an Nginx gateway exposed through Tailscale. Assistant, voice, STT, TTS, and LLM services are excluded from the baseline until they are reliable enough to ship as an optional profile. Major SemVer releases signal incompatible changes that may require planned downtime. -Deployments coordinate with backend work through a DB-backed maintenance mode, a deploy advisory lock, and bounded task draining before migrations run. This keeps old services serving while images are pulled, stops new mutating/background work before schema changes, and only restarts the app stack after migrations pass. +Future deployment hardening may add DB-backed maintenance mode, a deploy advisory lock, and bounded task draining before migrations run. The current `central-update` script pulls images, starts PostgreSQL, optionally backs up the database, runs migrations, starts Backend/Cockpit/Gateway, and checks health. diff --git a/docs/architecture.md b/docs/architecture.md index 0569372..a2d87bf 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -5,8 +5,8 @@ The repository is organized as a multi-project Nx workspace: - `apps/*`: user-facing applications (currently `apps/cockpit`) -- `services/*`: backend runtime services (`services/backend`, `services/assistant`) -- `i12e/*`: infrastructure and orchestration projects (`postgres`, `orchestrator`) +- `services/*`: backend runtime services (`backend`, `assistant`, `stt`, `tts`, `llm`) +- `i12e/*`: infrastructure and orchestration projects (`postgres`, `orchestrator`, `gateway`) - `libs/*`: shared reusable libraries (currently `ts-log` for cross-cutting TypeScript logging) - `docs/*`: cross-cutting repository documentation @@ -15,14 +15,14 @@ The repository is organized as a multi-project Nx workspace: ### Cockpit (`apps/cockpit`) - TanStack Start + React frontend. -- Fetches data on the cockpit server via TanStack Start loaders/server functions, then sends the result to the browser. -- Cockpit reaches the backend service over HTTP for weather (`/api/v1/weather/current` and `/api/v1/weather/forecast`). -- Cockpit reaches assistant-service over HTTP (`POST /api/v1/assistant/turn` and `POST /api/v1/assistant/turn/stream`). +- Fetches backend data on the cockpit server via TanStack Start server functions, then sends the result to the browser. +- Backend exposes weather current and forecast endpoints; Cockpit currently calls `/api/v1/weather/current`. +- Cockpit has assistant client code for `POST /api/v1/assistant/turn` and `POST /api/v1/assistant/turn/stream`, but the default UI does not start turns while browser VAD is disabled. - Configuration: - Runtime backend base URL is configured via `BACKEND_BASE_URL` with `VITE_BACKEND_API_BASE_URL` as a browser/build-time fallback. - Runtime assistant service base URL is configured via `ASSISTANT_SERVICE_BASE_URL` with `VITE_ASSISTANT_API_BASE_URL` as a browser/build-time fallback. - If neither backend variable is set, cockpit defaults to `http://localhost:3010` for local orchestrator-driven development. -- If neither assistant variable is set, cockpit defaults to `http://localhost:3020` for local orchestrator-driven development. +- If neither assistant variable is set, cockpit defaults to `http://localhost:3020`, matching the standalone assistant container run target. ### Backend Service (`services/backend`) @@ -48,7 +48,7 @@ The repository is organized as a multi-project Nx workspace: - `llm-proxy` mode to call an external LLM while keeping mock STT / TTS boundaries. - `openai` mode to use OpenAI-native STT / LLM / TTS endpoints. - `proxy` mode to call external STT / LLM / TTS runtimes. -- The orchestrated dev and prod stacks use `proxy` mode by default, backed by faster-whisper STT, Qwen3-TTS voice cloning, and an Ollama-based LLM wrapper. +- The standalone service supports `proxy` mode with faster-whisper STT, Qwen3-TTS voice cloning, and an Ollama-based LLM wrapper. The orchestrator compose definitions for those support services are currently commented out, so the active dev/prod stack does not start assistant services by default. ### Persistence (`i12e/postgres`) @@ -57,8 +57,8 @@ The repository is organized as a multi-project Nx workspace: ### Orchestration (`i12e/orchestrator`) -- Docker Compose project used to start the full local stack. -- Separate environment files define dev and prod default port mappings and assistant model settings. +- Docker Compose project used to start the active local stack. +- Separate environment files define dev and prod default port mappings. Assistant model settings remain in env templates for the commented assistant support services. - Production releases are code-free on the server: CI publishes a tested core image set to GHCR plus a deploy bundle with `docker-compose.prod.yml`, `central-update`, and `.env.prod.example`. - The production baseline runs PostgreSQL, a migration job, Backend, Cockpit, and an Nginx gateway on one Docker Compose host. Assistant, voice, STT, TTS, and LLM services are optional future production profiles, not part of the baseline. @@ -82,27 +82,31 @@ The repository is organized as a multi-project Nx workspace: ### Weather 1. Browser requests cockpit. -2. Cockpit server requests the backend weather API. -3. The backend weather domain checks the PostgreSQL cache first. -4. If the cache is stale or missing, the backend fetches fresh data from Open-Meteo. -5. Fresh responses are returned immediately; persistence writes happen asynchronously. -6. Cockpit serializes widget data to the client and can refresh through server functions without exposing backend directly to the browser. +2. Cockpit widget calls a TanStack Start server function. +3. The server function requests the backend weather API. +4. The backend weather domain checks the PostgreSQL cache first. +5. If the cache is stale or missing, the backend fetches fresh data from Open-Meteo. +6. Fresh responses are returned immediately; persistence writes happen asynchronously. +7. Cockpit serializes widget data to the client and can refresh through server functions without exposing backend directly to the browser. ### Finance Cash -1. Browser opens `/finance/cash`. +1. Browser opens `/finance/transactions`. 2. Cockpit server functions call backend finance APIs. 3. Backend persists manual income and expense transactions in PostgreSQL. -4. Monthly summaries are computed from transactions filtered by transaction date. +4. Summaries are computed from transactions filtered by transaction date. +5. Cockpit currently lists and creates transactions end to end; edit and delete UI controls still need route-id wiring in the server function helpers. ### Voice -1. Browser VAD cuts a speech segment locally. -2. Browser posts the segment directly to assistant-service's streaming endpoint. +1. Browser VAD is currently disabled, so Cockpit does not capture speech segments in the default UI. +2. When the implemented turn client is invoked, Cockpit posts audio to assistant-service's streaming endpoint. 3. Assistant-service performs `STT -> streamed LLM -> chunked TTS`. 4. Browser plays synthesized chunks as they arrive. 5. Cockpit still exposes a non-streaming server function path for fallback flows. +This flow exists in the Cockpit and assistant-service code, but the assistant service and support runtimes are not active in the default orchestrator compose stack while their service blocks remain commented out. + ## Boundary Rules - Keep UI and presentation concerns in `apps/*`. diff --git a/docs/deployment.md b/docs/deployment.md new file mode 100644 index 0000000..f455873 --- /dev/null +++ b/docs/deployment.md @@ -0,0 +1,141 @@ +# Release and Production Deployment + +Central production deploys from tested container images, not from source code on the server. + +## Release flow + +CI in `.github/workflows/ci.yml` owns release creation: + +1. Pull requests run validation and build disposable GHCR images tagged `pr--`. +2. Pushes to `main` run validation, build GHCR images tagged `sha-`, and smoke-test that exact image set with the production Compose file. +3. Git tags matching `v*.*.*` also run validation, build `sha-` images, and smoke-test the image set. +4. After smoke tests pass on a tag push, CI publishes version tags from the tested SHA images. +5. Exact stable tags such as `v1.2.3` also move the `stable` tag. +6. Prerelease tags such as `v1.3.0-rc.1` publish only that version tag and do not move `stable`. +7. Tag releases package a deploy bundle artifact containing: + - `docker-compose.prod.yml` + - `central-update` + - `.env.prod.example` + +The release tag must point at a commit reachable from `main`; CI rejects release tags outside `main`. + +## Images + +Production Compose pulls these images from GHCR: + +- `ghcr.io/themattcode/central/app-cockpit:${CENTRAL_VERSION}` +- `ghcr.io/themattcode/central/service-backend:${CENTRAL_VERSION}` +- `ghcr.io/themattcode/central/i12e-postgres:${CENTRAL_VERSION}` +- `ghcr.io/themattcode/central/i12e-gateway:${CENTRAL_VERSION}` + +`CENTRAL_VERSION` defaults to `stable`. Use exact release tags for pinned deployments and rollbacks. + +## Create a release + +From a clean `main` commit: + +```bash +git tag v1.2.3 +git push origin v1.2.3 +``` + +Then wait for CI to pass. The tag push publishes the versioned image set, updates `stable` for exact stable SemVer tags, and uploads the deploy bundle artifact. + +For a prerelease: + +```bash +git tag v1.3.0-rc.1 +git push origin v1.3.0-rc.1 +``` + +Deploy prereleases by exact version. `stable` remains unchanged. + +## Server setup + +Install Docker and Tailscale on the production host. Tailscale is host-managed; the Compose stack binds the gateway to `127.0.0.1:4000` by default so Tailscale Serve can expose it over the tailnet. + +Unpack the deploy bundle on the server, then create the environment file once: + +```bash +cp .env.prod.example .env.prod +``` + +Set production values in `.env.prod`: + +- `POSTGRES_PASSWORD` +- `BACKEND_DATABASE_URL` +- `BACKEND_CORS_ALLOW_ORIGIN` +- `CENTRAL_ORIGIN` +- optional `CENTRAL_VERSION` + +If GHCR packages are private, log in on the host: + +```bash +docker login ghcr.io +``` + +## Deploy or update + +Run from the unpacked deploy bundle: + +```bash +./central-update +``` + +The script prompts for a version and defaults to `stable`. + +Deploy an exact version: + +```bash +./central-update v1.2.3 +``` + +Deploy a prerelease: + +```bash +./central-update v1.3.0-rc.1 +``` + +Major version jumps require explicit approval: + +```bash +./central-update v2.0.0 --allow-major +``` + +The update script: + +1. writes `CENTRAL_VERSION` to `.env.prod`, +2. pulls the selected image set, +3. starts PostgreSQL and waits for health, +4. creates a PostgreSQL backup for major jumps or when `--backup` is passed, +5. runs migrations, +6. starts `service-backend`, `app-cockpit`, and `i12e-gateway`, +7. checks gateway and backend health, +8. prints Compose status. + +Backups are written to `backups/` next to the deploy bundle unless `BACKUP_DIR` overrides it. Use `--skip-backup` only when an external backup already exists. + +## Rollback + +Rollback by deploying an older exact release tag: + +```bash +./central-update v1.2.2 +``` + +Database rollback is not automatic. If migrations are not backward-compatible, restore from a backup before starting the older version. + +## Inspect production + +Use the same env and Compose file as the update script: + +```bash +docker compose --env-file .env.prod --file docker-compose.prod.yml ps +docker compose --env-file .env.prod --file docker-compose.prod.yml logs +``` + +Stop the stack: + +```bash +docker compose --env-file .env.prod --file docker-compose.prod.yml down +``` diff --git a/docs/service-catalog.md b/docs/service-catalog.md index b77ddf5..4661b74 100644 --- a/docs/service-catalog.md +++ b/docs/service-catalog.md @@ -5,32 +5,50 @@ Source of truth: - Local dev/release-style stack: `i12e/orchestrator/docker-compose.yml` - Production server deploy bundle: `i12e/orchestrator/deploy/docker-compose.prod.yml` -## Orchestrated services - -| Service | Purpose | Container port(s) | -| ----------------------- | -------------------------------------------------- | ---------------------- | -| `app-cockpit` | Cockpit web application | `3000/tcp` | -| `i12e-gateway` | Production Nginx entrypoint for Cockpit | `8080/tcp` | -| `i12e-postgres` | PostgreSQL database | `5432/tcp` | -| `i12e-postgres-migrate` | One-off migration runner | None (no exposed port) | -| `service-backend` | Integrated backend HTTP API | `8080/tcp` | -| `service-stt` | Faster-whisper STT adapter | `8081/tcp` | -| `service-tts` | Qwen3-TTS voice-clone adapter | `8082/tcp` | -| `service-llm-runtime` | Ollama runtime | `11434/tcp` | -| `service-llm-pull` | One-off Ollama model puller | None (no exposed port) | -| `service-llm` | OpenAI-compatible LLM adapter | `8083/tcp` | -| `service-assistant` | Assistant turn orchestration (`STT -> LLM -> TTS`) | `8080/tcp` | +## Active local orchestrator services + +| Service | Purpose | Container port(s) | +|-------------------------|-----------------------------|------------------------| +| `app-cockpit` | Cockpit web application | `3000/tcp` | +| `i12e-postgres` | PostgreSQL database | `5432/tcp` | +| `i12e-postgres-migrate` | One-off migration runner | None (no exposed port) | +| `service-backend` | Integrated backend HTTP API | `8080/tcp` | + +The local orchestrator currently starts only PostgreSQL, migrations, Backend, and Cockpit. + +## Active production deploy services + +| Service | Purpose | Container port(s) | +|-------------------------|-----------------------------------------|------------------------| +| `app-cockpit` | Cockpit web application | `3000/tcp` | +| `i12e-gateway` | Production Nginx entrypoint for Cockpit | `8080/tcp` | +| `i12e-postgres` | PostgreSQL database | `5432/tcp` | +| `i12e-postgres-migrate` | One-off migration runner | None (no exposed port) | +| `service-backend` | Integrated backend HTTP API | `8080/tcp` | + +## Implemented but not active in orchestrator + +These services have project code and standalone Nx targets, but their orchestrator compose blocks are commented out. + +| Service | Purpose | Container port(s) | +|-----------------------|----------------------------------------------------|------------------------| +| `service-stt` | Faster-whisper STT adapter | `8081/tcp` | +| `service-tts` | Qwen3-TTS voice-clone adapter | `8082/tcp` | +| `service-llm-runtime` | Ollama runtime | `11434/tcp` | +| `service-llm-pull` | One-off Ollama model puller | None (no exposed port) | +| `service-llm` | OpenAI-compatible LLM adapter | `8083/tcp` | +| `service-assistant` | Assistant turn orchestration (`STT -> LLM -> TTS`) | `8080/tcp` | ## Production server host port mappings Production server deployment uses the code-free deploy bundle. It exposes only the gateway by default. -| Service | Compose mapping | Default host -> container | -| -------------- | ---------------------------------------------- | ------------------------- | -| `i12e-gateway` | `${GATEWAY_BIND}:${GATEWAY_PORT}:8080` | `127.0.0.1:4000 -> 8080` | -| `app-cockpit` | None | None | -| `service-backend` | None | None | -| `i12e-postgres` | None | None | +| Service | Compose mapping | Default host -> container | +|-------------------|----------------------------------------|---------------------------| +| `i12e-gateway` | `${GATEWAY_BIND}:${GATEWAY_PORT}:8080` | `127.0.0.1:4000 -> 8080` | +| `app-cockpit` | None | None | +| `service-backend` | None | None | +| `i12e-postgres` | None | None | Tailscale is managed by the host and can forward HTTPS traffic to `127.0.0.1:4000`. @@ -43,61 +61,58 @@ Defaults come from: Runtime local production values come from ignored `i12e/orchestrator/.env.prod`. -| Service | Compose mapping | Dev / staging default (host -> container) | Prod default (host -> container) | -| ----------------------- | --------------------------- | ----------------------------------------- | -------------------------------- | -| `app-cockpit` | `${COCKPIT_PORT}:3000` | `3000 -> 3000` | `4000 -> 3000` | -| `i12e-postgres` | `${POSTGRES_PORT}:5432` | `3001 -> 5432` | `4001 -> 5432` | -| `i12e-postgres-migrate` | None | None | None | -| `service-backend` | `${BACKEND_PORT}:8080` | `3010 -> 8080` | `4010 -> 8080` | -| `service-stt` | `${STT_PORT}:8081` | `3030 -> 8081` | `4030 -> 8081` | -| `service-tts` | `${TTS_PORT}:8082` | `3040 -> 8082` | `4040 -> 8082` | -| `service-llm-runtime` | `${LLM_RUNTIME_PORT}:11434` | `3051 -> 11434` | `4051 -> 11434` | -| `service-llm` | `${LLM_PORT}:8083` | `3050 -> 8083` | `4050 -> 8083` | -| `service-assistant` | `${ASSISTANT_PORT}:8080` | `3020 -> 8080` | `4020 -> 8080` | +| Service | Compose mapping | Dev / staging default (host -> container) | Prod default (host -> container) | +|-------------------------|-------------------------|-------------------------------------------|----------------------------------| +| `app-cockpit` | `${COCKPIT_PORT}:3000` | `3000 -> 3000` | `4000 -> 3000` | +| `i12e-postgres` | `${POSTGRES_PORT}:5432` | `3001 -> 5432` | `4001 -> 5432` | +| `i12e-postgres-migrate` | None | None | None | +| `service-backend` | `${BACKEND_PORT}:8080` | `3010 -> 8080` | `4010 -> 8080` | + +The assistant, STT, TTS, and LLM port variables still exist in env templates but are inactive until the matching compose service blocks are re-enabled. + +If re-enabled in `i12e/orchestrator/docker-compose.yml`, their local compose host mappings would be: + +| Service | Compose mapping | Dev default (host -> container) | Local prod template default (host -> container) | +|-----------------------|-----------------------------|---------------------------------|-------------------------------------------------| +| `service-assistant` | `${ASSISTANT_PORT}:8080` | `3020 -> 8080` | `4020 -> 8080` | +| `service-stt` | `${STT_PORT}:8081` | `3030 -> 8081` | `4030 -> 8081` | +| `service-tts` | `${TTS_PORT}:8082` | `3040 -> 8082` | `4040 -> 8082` | +| `service-llm-runtime` | `${LLM_RUNTIME_PORT}:11434` | `3051 -> 11434` | `4051 -> 11434` | +| `service-llm` | `${LLM_PORT}:8083` | `3050 -> 8083` | `4050 -> 8083` | ## Related environment differences -| Variable | Dev | Prod | -| ----------------------------- | ------------------------------------ | ------------------------------------ | -| `COCKPIT_NODE_ENV` | `development` | `production` | -| `COMPOSE_PROJECT_NAME` | `central-i12e-dev` | `central-i12e-prod` | -| `BACKEND_BASE_URL` | `http://service-backend:8080` | `http://service-backend:8080` | -| `ASSISTANT_SERVICE_BASE_URL` | `http://service-assistant:8080` | `http://service-assistant:8080` | -| `VITE_BACKEND_API_BASE_URL` | `http://localhost:3010` | `http://localhost:4010` | -| `VITE_ASSISTANT_API_BASE_URL` | `http://localhost:3020` | `http://localhost:4020` | -| `ASSISTANT_BACKEND_MODE` | `proxy` | `proxy` | -| `STT_URL` | `http://service-stt:8081/transcribe` | `http://service-stt:8081/transcribe` | -| `TTS_URL` | `http://service-tts:8082/synthesize` | `http://service-tts:8082/synthesize` | -| `LLM_BASE_URL` | `http://service-llm:8083` | `http://service-llm:8083` | -| `LLM_MODEL` | `qwen3.5:4b` | `qwen3:8b` | +| Variable | Dev | Prod | +|-----------------------------|-------------------------------|-------------------------------| +| `COCKPIT_NODE_ENV` | `development` | `production` | +| `COMPOSE_PROJECT_NAME` | `central-i12e-dev` | `central-i12e-prod` | +| `BACKEND_BASE_URL` | `http://service-backend:8080` | `http://service-backend:8080` | +| `VITE_BACKEND_API_BASE_URL` | `http://localhost:3010` | `http://localhost:4010` | The code-free production deploy bundle adds: -| Variable | Production default | -| ----------------- | ------------------------------ | -| `CENTRAL_VERSION` | `stable` | -| `GATEWAY_BIND` | `127.0.0.1` | -| `GATEWAY_PORT` | `4000` | +| Variable | Production default | +|-------------------|----------------------------------| +| `CENTRAL_VERSION` | `stable` | +| `GATEWAY_BIND` | `127.0.0.1` | +| `GATEWAY_PORT` | `4000` | | `CENTRAL_ORIGIN` | `https://central.example.ts.net` | ## Internal service endpoints (compose network) -| Service | Endpoint | -| --------------------- | ---------------------------------- | -| `i12e-gateway` | `http://i12e-gateway:8080` | -| `app-cockpit` | `http://app-cockpit:3000` | -| `i12e-postgres` | `i12e-postgres:5432` | -| `service-backend` | `http://service-backend:8080` | -| `service-stt` | `http://service-stt:8081` | -| `service-tts` | `http://service-tts:8082` | -| `service-llm-runtime` | `http://service-llm-runtime:11434` | -| `service-llm` | `http://service-llm:8083` | -| `service-assistant` | `http://service-assistant:8080` | +| Service | Endpoint | +|-------------------|-------------------------------| +| `i12e-gateway` | `http://i12e-gateway:8080` | +| `app-cockpit` | `http://app-cockpit:3000` | +| `i12e-postgres` | `i12e-postgres:5432` | +| `service-backend` | `http://service-backend:8080` | + +Assistant, STT, TTS, and LLM endpoints exist only when their commented compose blocks are re-enabled or when services are run standalone. ## Non-orchestrated local dev ports | Service | Mode | Host port(s) | -| ------------------- | ------------------------------------------------------------------------ | ------------ | +|---------------------|--------------------------------------------------------------------------|--------------| | `cockpit` | Vite dev server (`pnpm nx run cockpit:start`) | `5000` | | `cockpit` | Container run (`pnpm nx run cockpit:container-run`) | `5000` | | `postgres` | Standalone container run (`pnpm nx run i12e-postgres:run`) | `5001` | diff --git a/docs/style.md b/docs/style.md index 9cc892e..e2aff28 100644 --- a/docs/style.md +++ b/docs/style.md @@ -4,6 +4,8 @@ ### Scripts +These rules are preferred for new or changed package scripts. Some root-level convenience scripts predate the convention; do not churn them unless the task already touches that area. + - **Names**: npm script names MUST contain only lower case letters, `:` to separate parts, `-` to separate words, and `+` to separate file extensions. Each part name SHOULD be either a full English word (e.g. `coverage` not `cov`) or a well-known initialism in all lowercase (e.g. `wasm`). Here is a summary of the proposal in ABNF: ``` name = life-cycle / main target? option* ":watch"? diff --git a/docs/toolchain.md b/docs/toolchain.md index 377c910..a872c4d 100644 --- a/docs/toolchain.md +++ b/docs/toolchain.md @@ -5,12 +5,11 @@ - Monorepo: Nx (integrated workspace) - Package manager: pnpm - Frontend framework: TanStack Start (React) + TypeScript -- Backend services: Rust (Axum) +- Backend services: Rust (Axum) for Backend and Assistant; Python HTTP adapters for STT, TTS, and LLM - Routing: TanStack Router - Build/dev server: Vite (via TanStack Start) - Styling: Tailwind CSS -- Unit tests: Vitest + Testing Library -- E2E tests: Playwright +- Tests: Vitest + Testing Library for TypeScript, Cargo tests for Rust, `unittest`/`py_compile` for Python adapters - CI: GitHub Actions, staging tested build images in GHCR and publishing release tags from the tested image set - Node requirement: `>=24` (`package.json`, `.nvmrc` uses `lts/*`) @@ -143,7 +142,7 @@ nx run assistant-service:container-run The assistant container run target publishes `5020:8080`. -The standalone assistant container still defaults to `ASSISTANT_BACKEND_MODE=mock` unless environment overrides are supplied. The orchestrated dev and prod paths default to `ASSISTANT_BACKEND_MODE=proxy` with STT, TTS, and LLM services wired in. +The standalone assistant container still defaults to `ASSISTANT_BACKEND_MODE=mock` unless environment overrides are supplied. The commented orchestrator assistant definition defaults to `ASSISTANT_BACKEND_MODE=proxy` with STT, TTS, and LLM services wired in. ### STT and TTS containers @@ -186,17 +185,18 @@ The standalone LLM wrapper publishes `5050:8083`. ### Orchestrator project -Start the complete hot-reload development environment: +Start the active hot-reload development environment: ```bash pnpm dev ``` -This delegates to `nx run i12e-orchestrator:dev` / `up-dev-hot`. It starts PostgreSQL, runs migrations, starts Ollama and pulls the configured model, then starts the service stack with the dev compose overlay: +This delegates to `nx run i12e-orchestrator:dev` / `up-dev-hot`. It starts PostgreSQL, runs migrations, then starts the active service stack with the dev compose overlay: - `app-cockpit` runs the Vite dev server for browser HMR. -- `service-backend` and `service-assistant` run through `cargo watch`. -- `service-stt`, `service-tts`, and `service-llm` use Docker Compose watch with restart-on-change. +- `service-backend` runs through `cargo watch`. + +Assistant, STT, TTS, Ollama runtime, and LLM wrapper compose blocks exist as commented prototypes in the orchestrator compose files. Their standalone service projects and container targets are still available, but they are not part of the active `pnpm dev` or `pnpm prod` stack. Stop the dev stack with: @@ -218,7 +218,7 @@ Start the production environment: pnpm prod ``` -The long-term production deployment path is the code-free deploy bundle under `i12e/orchestrator/deploy`. CI pushes PR and SHA build images to GHCR for integration testing, deletes PR-tagged images when pull requests close, publishes release tags from the tested SHA image set, and packages: +The production deployment path is the code-free deploy bundle under `i12e/orchestrator/deploy`. CI pushes PR and SHA build images to GHCR for integration testing, publishes release tags from the tested SHA image set, and packages: - `docker-compose.prod.yml` - `central-update` @@ -233,7 +233,7 @@ pnpm prod:down ``` The orchestrator `up-*` targets share startup sequencing through `i12e/orchestrator/scripts/up_stack.sh`. -Startup waits for PostgreSQL and Ollama health before running migrations or pulling the configured model. +Startup waits for PostgreSQL health before running migrations. Long-running services use `SERVICE_RESTART_POLICY`; dev defaults to `no`, prod example defaults to `unless-stopped`. Advanced targets: @@ -244,20 +244,19 @@ nx run i12e-orchestrator:up-dev-llm-proxy-ollama ``` `up-dev` starts release-style dev containers without watchers. -`up-dev-llm-proxy-ollama` keeps `service-assistant` in `llm-proxy` mode and points `LLM_BASE_URL` at the Ollama runtime's OpenAI-compatible `/v1` endpoint. +`up-dev-llm-proxy-ollama` is retained as an experimental target, but it currently references assistant and Ollama compose services that are commented out in `docker-compose.yml`. -The tracked `i12e/orchestrator/.env.dev` biases this path toward quality over speed with a larger STT model, less aggressive extra STT VAD, and slightly less choppy TTS streaming. -The main compose file is GPU-backed by default for assistant support services: `service-stt`, `service-tts`, and `service-llm-runtime` request `gpus: all`. STT defaults to CUDA/FP16, TTS defaults to CUDA with FlashAttention 2 installed from a prebuilt wheel, and the stack requires a Docker host with working GPU container support. +The tracked `i12e/orchestrator/.env.dev` still includes assistant tuning values. Those values are not consumed by the active compose stack unless the commented assistant support services are re-enabled. The commented service definitions request `gpus: all` for STT, TTS, and Ollama runtime. -Run a full voice smoke test, including stack startup, STT, LLM, and TTS: +Run the experimental voice smoke test target: ```bash nx run i12e-orchestrator:smoke-dev-voice ``` -Override `SMOKE_VOICE_TEXT` and `SMOKE_VOICE_LANGUAGE` when you want to exercise a different sample input. +Override `SMOKE_VOICE_TEXT` and `SMOKE_VOICE_LANGUAGE` when you want to exercise a different sample input. This target depends on the currently commented assistant support services and is not part of the active default stack. -The production target starts the same service classes by default, using port mappings from ignored `i12e/orchestrator/.env.prod`. +The local production target starts PostgreSQL, migrations, Backend, and Cockpit by default, using port mappings from ignored `i12e/orchestrator/.env.prod`. The migration step runs as a one-off `postgres-migrate` container and is removed after completion. diff --git a/docs/vision.md b/docs/vision.md index e023b59..604faa3 100644 --- a/docs/vision.md +++ b/docs/vision.md @@ -1,54 +1,63 @@ -# Features +# Vision -Provides a comprehensive suite of tools to manage various aspects of your personal life and finances. +Central aims to become a comprehensive suite of tools to manage personal life and finances. + +Current implemented slices: + +- Cockpit dashboard with weather widgets. +- Manual finance transactions with income/expense summaries. +- Backend finance and weather APIs backed by PostgreSQL. +- Assistant, STT, TTS, and LLM service code exists, but voice capture and orchestrated model services are currently disabled by default. + +The sections below describe product direction, not all currently shipped behavior. ## 💰 Finances ### ⮀ Income and Expense Management -* Income and Expense tracking with customizable categories. -* Budget management. -* Analytics: Spending tracking with short-, mid-, and long-term analysis. -* Receipt scanning and automatic expense entry. -* Manage recurring payments like subscriptions, insurance, and memberships. -* Renewal notifications, reminders, and calendar integration. +- Income and Expense tracking with customizable categories. +- Budget management. +- Analytics: Spending tracking with short-, mid-, and long-term analysis. +- Receipt scanning and automatic expense entry. +- Manage recurring payments like subscriptions, insurance, and memberships. +- Renewal notifications, reminders, and calendar integration. ### 📈 Investments Portfolio -* Real-time portfolio overview. -* Multi-asset tracking: Stocks, ETFs, etc. -* Tax reporting. -* Transaction history. -* Dividend tracking. -* Risk assessment. -* News tracking and alerts. +- Real-time portfolio overview. +- Multi-asset tracking: Stocks, ETFs, etc. +- Tax reporting. +- Transaction history. +- Dividend tracking. +- Risk assessment. +- News tracking and alerts. ## 📄 Contracts Management -* Repository with contract storage. -* Renewal and expiration tracking. -* Reminder for important actions. -* Payment schedule, notification, and calendar integration. +- Repository with contract storage. +- Renewal and expiration tracking. +- Reminder for important actions. +- Payment schedule, notification, and calendar integration. ## 🛡️ Product and Warranties Tracking -* Catalog of products with digital receipts. -* Warranty coverage tracking. -* Warranty calendar with expiration timeline. -* Warranty claim management and status tracking. -* Maintenance schedule and reminders. +- Catalog of products with digital receipts. +- Warranty coverage tracking. +- Warranty calendar with expiration timeline. +- Warranty claim management and status tracking. +- Maintenance schedule and reminders. ## Notes -* Personal notes and documentation. -* Tagging and categorization. -* Search and filtering. -* Import and export. +- Personal notes and documentation. +- Tagging and categorization. +- Search and filtering. +- Import and export. ## Library -* Digital bookshelf with book tracking. -* Reading progress tracking. -* Recommendations based on reading habits. -* Writing own books to dump knowledge, like cookbooks, IT, etc. -* Journaling. +- Digital bookshelf with book tracking. +- Reading progress tracking. +- Recommendations based on reading habits. +- Writing own books to dump knowledge, like cookbooks, IT, etc. +- Journaling. diff --git a/i12e/orchestrator/README.md b/i12e/orchestrator/README.md index 437197b..6eb682f 100644 --- a/i12e/orchestrator/README.md +++ b/i12e/orchestrator/README.md @@ -22,8 +22,8 @@ pnpm dev This delegates to `pnpm nx run i12e-orchestrator:dev`, which starts the stack with [`docker-compose.dev.yml`](./docker-compose.dev.yml): - Cockpit runs Vite dev server HMR. -- Backend and assistant run under `cargo watch`. -- STT, TTS, and LLM adapter containers restart when their Python source changes. +- Backend runs under `cargo watch`. +- Assistant, STT, TTS, and LLM adapter compose blocks are currently commented out, so they are not part of `pnpm dev`. Stop it with: @@ -42,7 +42,7 @@ Edit `i12e/orchestrator/.env.prod` and set real secrets, especially: - `POSTGRES_PASSWORD` - `BACKEND_DATABASE_URL` - `BACKEND_CORS_ALLOW_ORIGIN` -- `ASSISTANT_CORS_ALLOW_ORIGIN` +- `ASSISTANT_CORS_ALLOW_ORIGIN` if assistant services are re-enabled Prod startup refuses placeholder DB credentials or wildcard CORS. @@ -77,9 +77,7 @@ Startup order: 1. Start PostgreSQL with Compose health waiting. 2. Run migrations. -3. Start Ollama with Compose health waiting. -4. Pull configured model. -5. Start application services. +3. Start application services. Long-running services use `SERVICE_RESTART_POLICY`; dev sets `no`, prod example sets `unless-stopped`. @@ -91,53 +89,50 @@ Release-style detached development stack: pnpm nx run i12e-orchestrator:up-dev ``` -Dev with mock STT/TTS and direct Ollama: +Experimental dev with mock STT/TTS and direct Ollama: ```bash pnpm nx run i12e-orchestrator:up-dev-llm-proxy-ollama ``` -This keeps `service-assistant` in `llm-proxy` mode and points it at: +This target currently references assistant and Ollama compose services that are commented out in `docker-compose.yml`. If those services are re-enabled, it keeps `service-assistant` in `llm-proxy` mode and points it at: - `http://service-llm-runtime:11434/v1/chat/completions` -This is the thinnest LLM integration path in the repo because `service-assistant` talks to Ollama directly and keeps STT/TTS mocked. The Ollama runtime uses the standard GPU-backed compose configuration and requests `gpus: all`. +This is the thinnest LLM integration path in the repo because `service-assistant` talks to Ollama directly and keeps STT/TTS mocked. The commented Ollama runtime definition requests `gpus: all`. ## Default assistant stack -`up-dev`, `up-dev-hot`, and `pnpm dev` keep `service-assistant` in `proxy` mode and point it at: +The default assistant stack is implemented in service code but disabled in the active orchestrator compose files. The commented configuration keeps `service-assistant` in `proxy` mode and points it at: - `http://service-stt:8081/transcribe` - `http://service-tts:8082/synthesize` - `http://service-llm:8083/chat/completions` This path keeps the `llm-service` wrapper in front of Ollama, which is useful when you want lazy model pulls and a repo-owned adapter boundary. It reuses `LLM_MODEL` from `i12e/orchestrator/.env.dev`. -The STT, TTS, and Ollama runtime services request `gpus: all` in the main compose file. This requires Docker GPU support on the host, typically NVIDIA Container Toolkit on Linux. +The STT, TTS, and Ollama runtime service definitions request `gpus: all`. Re-enabling them requires Docker GPU support on the host, typically NVIDIA Container Toolkit on Linux. -## Smoke-test the complete voice stack +## Experimental voice stack smoke test ```bash pnpm nx run i12e-orchestrator:smoke-dev-voice ``` -This target starts the complete voice stack if needed, then runs one spoken roundtrip through STT, Qwen via Ollama, and TTS. +This target depends on the currently commented assistant support services. Once those are re-enabled, it starts the voice stack if needed, then runs one spoken roundtrip through STT, Qwen via Ollama, and TTS. Override these environment variables when needed: - `SMOKE_VOICE_TEXT` - `SMOKE_VOICE_LANGUAGE` -`pnpm dev`, `pnpm prod`, and advanced `up-*` targets bring up: +`pnpm dev`, `pnpm prod`, and `up-dev` bring up: - Cockpit app (`app-cockpit` service) - PostgreSQL (`i12e-postgres` service) - Migration runner (`i12e-postgres-migrate`) as a one-off container (`--rm`) -- Backend (`service-backend` service, currently serving the weather domain) -- Faster-whisper STT (`service-stt` service) -- Qwen3-TTS voice cloning (`service-tts` service) -- Ollama runtime (`service-llm-runtime` service) -- LLM wrapper (`service-llm` service) -- Assistant backend (`service-assistant` service) +- Backend (`service-backend` service, serving finance and weather domains) + +Assistant backend, Faster-whisper STT, Qwen3-TTS, Ollama runtime, and LLM wrapper code exists in `services/*`, but the orchestrator service blocks are commented out. Environment files: @@ -145,9 +140,9 @@ Environment files: - `i12e/orchestrator/.env.prod.example`: tracked production template. - `i12e/orchestrator/.env.prod`: ignored local production config. -When cockpit runs inside the compose network, its server-side runtime must reach backend through `http://service-backend:8080` and assistant-service through `http://service-assistant:8080`. +When cockpit runs inside the compose network, its server-side runtime reaches backend through `http://service-backend:8080`. -Cockpit's browser bundle is separate: in Compose it should use the published host ports (`http://localhost:3010` and `http://localhost:3020` in dev by default) for any direct browser fetches. +Cockpit's browser bundle is separate: in Compose it should use the published host port (`http://localhost:3010` in dev by default) for any direct browser fetches. Service and port mapping details (including dev/prod differences) are documented in [`docs/service-catalog.md`](../../docs/service-catalog.md). diff --git a/i12e/orchestrator/deploy/README.md b/i12e/orchestrator/deploy/README.md index bf35baa..cad4359 100644 --- a/i12e/orchestrator/deploy/README.md +++ b/i12e/orchestrator/deploy/README.md @@ -2,6 +2,8 @@ This directory is the source for the code-free production deploy bundle published by CI for release tags. +See `docs/deployment.md` in the repository for the full release and production deployment flow. + ## Server setup Install Docker and Tailscale on the production host. Central assumes Tailscale is managed at the host level; the Compose stack binds the gateway to `127.0.0.1:4000` by default so Tailscale Serve can expose it over the tailnet. @@ -31,4 +33,3 @@ Major version jumps require: ``` The script pulls the selected image set, starts PostgreSQL, runs migrations, restarts the core application services, checks health, and prints Compose status. - diff --git a/services/assistant/README.md b/services/assistant/README.md index ef7d5e9..17b8bcc 100644 --- a/services/assistant/README.md +++ b/services/assistant/README.md @@ -2,19 +2,21 @@ Rust backend service that owns the assistant-turn orchestration boundary for Central. -The intended request path is: +The full voice request path is: 1. Browser VAD cuts a speech segment locally. 2. Cockpit posts the segment to `service-assistant`. 3. `service-assistant` performs `STT -> streamed LLM -> chunked TTS`. 4. Cockpit receives transcript, response deltas, and audio chunks, then starts playback before the full turn finishes. +Cockpit currently has browser VAD disabled, so this path is available in the service and client code but is not started by the default UI. + ## Why this shape fits `central` - Browser clients never call model-serving infrastructure directly. - Cockpit stays responsible for app/session/auth boundaries. - Model-serving details stay behind a Rust/Axum service, separate from the integrated `services/backend` API. -- The standalone service can still run in `mock` mode for quick wiring tests, while the orchestrated stack defaults to STT, TTS, and LLM services. +- The standalone service can run in `mock` mode for quick wiring tests. The orchestrator compose blocks for assistant, STT, TTS, and LLM services are currently commented out. ## Architecture @@ -197,7 +199,7 @@ Or keep the `llm-service` wrapper when you want lazy model pulls and a repo-owne ## Configuration - `ASSISTANT_PORT` (default: `5020`) -- `ASSISTANT_BACKEND_MODE` (`mock`, `llm-proxy`, `openai`, or `proxy`, standalone default: `mock`; orchestrator default: `proxy`) +- `ASSISTANT_BACKEND_MODE` (`mock`, `llm-proxy`, `openai`, or `proxy`, standalone default: `mock`; commented orchestrator default: `proxy`) - `ASSISTANT_REQUEST_TIMEOUT_SECONDS` (default: `30`) - `TTS_STREAM_SOFT_LIMIT_CHARS` (default: `220`, larger values improve local TTS prosody but delay the first streamed audio chunk) - `ASSISTANT_CORS_ALLOW_ORIGIN` (default: `*`) diff --git a/services/llm/README.md b/services/llm/README.md index 02dbe21..ba64f30 100644 --- a/services/llm/README.md +++ b/services/llm/README.md @@ -4,7 +4,7 @@ OpenAI-compatible chat completion adapter for `service-assistant`. It exposes the `POST /chat/completions` contract expected by `services/assistant` and forwards requests to an Ollama runtime. -The default model for the orchestrated dev flow is `qwen2.5:3b`, because it keeps startup and response latency reasonable while still supporting German chat well. Override it with `LLM_MODEL` when you want a larger Qwen variant. +When the commented orchestrator LLM services are re-enabled, `LLM_MODEL` selects the Ollama model. The tracked dev env currently sets `qwen3.5:4b`; standalone service code still falls back through `LLM_DEFAULT_MODEL`. ## Endpoints diff --git a/services/stt/README.md b/services/stt/README.md index edf9ba8..e8100d6 100644 --- a/services/stt/README.md +++ b/services/stt/README.md @@ -40,7 +40,7 @@ Response body: `STT_MODEL` can be a standard `faster-whisper` model name or a local converted model path. For browser-segmented audio in this repository, better quality usually comes from a larger model such as `medium` or better and disabling the extra `faster-whisper` VAD layer with `STT_VAD_FILTER=false`, because the cockpit VAD already trims speech locally. -In the orchestrator compose stack, this service is built with CUDA runtime dependencies, requests `gpus: all`, and defaults to `STT_DEVICE=cuda` and `STT_COMPUTE_TYPE=float16`. +The commented orchestrator service definition is built with CUDA runtime dependencies, requests `gpus: all`, and defaults to `STT_DEVICE=cuda` and `STT_COMPUTE_TYPE=float16`. The service is not active in the default orchestrator stack while that compose block remains commented out. ## Nx targets diff --git a/services/tts/README.md b/services/tts/README.md index a81b90a..1a87f07 100644 --- a/services/tts/README.md +++ b/services/tts/README.md @@ -48,13 +48,13 @@ Response body: - `HF_HOME` (compose sets this under `/models/huggingface`) Qwen's best clone quality uses both reference audio and an accurate transcript from `TTS_REFERENCE_TEXT` or `TTS_REFERENCE_TEXT_FILE`. -The service defaults to the bundled transcript in `res/morgan-freeman.txt`, so the orchestrator uses the full voice-clone prompt by default. Set `TTS_X_VECTOR_ONLY_MODE=true` when using a custom sample without a transcript. +The service defaults to the bundled transcript in `res/morgan-freeman.txt`, so the commented orchestrator definition uses the full voice-clone prompt by default. Set `TTS_X_VECTOR_ONLY_MODE=true` when using a custom sample without a transcript. The Docker image installs PyTorch and Torchaudio for CUDA 12.8, then installs the matching prebuilt FlashAttention 2 wheel for Python 3.12 / Torch 2.8 / Linux x86_64. The image does not compile `flash-attn` from source during normal builds. When `TTS_ATTENTION_IMPLEMENTATION=flash_attention_2`, startup fails if the `flash_attn` package is unavailable so the service does not silently run without the required attention backend. `voiceInstruction` is kept in the HTTP contract, but the Qwen Base voice-clone path does not provide the free-form style control exposed by Qwen CustomVoice or OpenAI TTS. The service logs and ignores it while preserving the cloned Morgan voice across requests. -In the orchestrator compose stack and standalone `tts-service:container-run` target, this service requests `gpus: all`, loads Qwen onto `cuda:0`, uses the bundled `res/morgan-freeman.mp3` sample, and caches downloaded Hugging Face model files in the `central_tts_models` Docker volume. +The standalone `tts-service:container-run` target requests `gpus: all`, loads Qwen onto `cuda:0`, uses the bundled `res/morgan-freeman.mp3` sample, and caches downloaded Hugging Face model files in the `central_tts_models` Docker volume. The commented orchestrator service definition does the same, but it is not active in the default stack while that compose block remains commented out. ## Nx targets