CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI Secretary System — virtual secretary with voice cloning (XTTS v2, OpenVoice), pre-trained voices (Piper), local LLM (vLLM + Qwen/Llama/DeepSeek), cloud LLM fallback (Gemini, Kimi, OpenAI, Claude, DeepSeek, OpenRouter), and Claude Code CLI bridge. Features GSM telephony (SIM7600E-H), amoCRM integration, Vue 3 PWA admin panel, i18n (ru/en/kk), multi-instance Telegram bots with sales/payments, multi-instance WhatsApp bots (Cloud API), website chat widgets, and LoRA fine-tuning.

Commands

Build & Run

# Docker (recommended)
cp .env.docker.example .env && docker compose up -d          # GPU mode
docker compose -f docker-compose.yml -f docker-compose.cpu.yml up -d  # CPU mode
docker compose -f docker-compose.yml -f docker-compose.full.yml up -d # Full containerized (includes vLLM)
docker compose --profile vector-search up -d                 # + Vector Search microservice (:8003)

# Local
./start_gpu.sh              # GPU: XTTS + Qwen2.5-7B + LoRA
./start_cpu.sh              # CPU: Piper + Gemini API
curl http://localhost:8002/health

Admin Panel

cd admin && npm install     # First-time setup
cd admin && npm run build   # Production build (vue-tsc type-check + vite build)
cd admin && npm run dev     # Dev server (:5173), proxies /admin + /v1 + /health to :8002
DEV_MODE=1 ./start_gpu.sh   # Backend proxies to Vite dev server

Default login: admin / admin. Guest demo: demo / demo (read-only).

No frontend tests — npm test is not configured. Type checking happens during npm run build via vue-tsc -b.

Deploy gotcha: Vite deletes and recreates admin/dist/ (new inode), breaking Docker bind mounts. Always docker compose restart after npm run build.

Mobile App

cd mobile && npm install     # First-time setup
cd mobile && npm run build   # Production build (vue-tsc type-check + vite build)
cd mobile && npm run dev     # Dev server
cd mobile && npx cap sync android  # Sync web assets to Android project
cd mobile && npx cap open android  # Open in Android Studio → Build APK

User Management

python scripts/manage_users.py list                          # List all users
python scripts/manage_users.py create <user> <pass> --role user  # Roles: admin|user|web|guest
python scripts/manage_users.py set-password <user> <pass>    # Reset password
python scripts/manage_users.py set-role <user> <role>        # Change role
python scripts/manage_users.py disable <user>                # Deactivate

Mobile Internet (SIM7600E-H QMI)

sudo bash scripts/setup_mobile_internet.sh start   # Connect wwan0 via QMI
sudo bash scripts/setup_mobile_internet.sh stop    # Disconnect
sudo bash scripts/setup_mobile_internet.sh status  # Check connection
sudo bash scripts/mobile-internet-monitor.sh       # Daemon: auto-reconnect + VPN route failover
# systemd: scripts/mobile-internet.service          # Install as persistent service

Database Migrations

Three migration systems:

Alembic (preferred) — for schema changes (ALTER TABLE, new tables)
scripts/migrate_*.py — for data migrations. New scripts must use scripts/_migration_template.py (transaction-safe)
Base.metadata.create_all — auto-creates missing tables on startup (does not alter existing)

alembic upgrade head                        # Apply all pending migrations
alembic revision --autogenerate -m "desc"   # Generate migration from model changes
cp scripts/_migration_template.py scripts/migrate_<name>.py  # New data migration

Lint & Format

# Python
ruff check .                # Lint (see pyproject.toml for rules)
ruff check . --fix          # Auto-fix
ruff format .               # Format
ruff format --check .       # Check formatting

# Frontend
cd admin && npm run lint         # Lint + auto-fix
cd admin && npm run lint:check   # Lint only (CI-style)
cd admin && npm run format       # Prettier format
cd admin && npm run format:check # Check formatting

# All at once
pre-commit run --all-files

Testing

pytest tests/                          # All tests
pytest tests/unit/test_retry_on_busy.py -v  # Single file
pytest -k "test_chat" -v               # By name pattern
pytest -m "not slow" -v                # Exclude slow tests
pytest -m "not integration" -v         # Exclude integration (needs external services)

asyncio_mode = "auto" — async tests run without @pytest.mark.asyncio. Custom markers: slow, integration, gpu. Docker: docker exec ai-secretary python -m pytest tests/ -v -o asyncio_mode=auto.

CI

GitHub Actions (.github/workflows/ci.yml) on push to main/develop and PRs:

lint-backend — ruff check + format check + mypy on orchestrator.py only (mypy soft, || true)
lint-frontend — npm ci + eslint + build (type check)
security — Trivy vulnerability scanner

Always run lint locally before pushing. Protected branches require PR workflow — never push directly to main.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                  Orchestrator (port 8002)                     │
│  orchestrator.py + modules/*/router*.py (~28 routers)        │
│  ┌────────────────────────────────────────────────────────┐  │
│  │        Vue 3 Admin Panel (24 views, PWA)                │  │
│  │                admin/dist/                              │  │
│  └────────────────────────────────────────────────────────┘  │
└────────────┬──────────────┬──────────────┬───────────────────┘
             │              │              │
     ┌───────┴──┐    ┌──────┴───┐   ┌─────┴─────┐
     │ LLM      │    │ TTS      │   │ STT       │
     │ vLLM /   │    │ XTTS v2 /│   │ Vosk /    │
     │ Cloud    │    │ Piper    │   │ Whisper   │
     └──────────┘    └──────────┘   └───────────┘

Request flow: User message → FAQ check (instant match) OR LLM → TTS → Audio response

Deployment modes (DEPLOYMENT_MODE env var): full (default, everything), cloud (no GPU/TTS/STT/GSM), local (same as full). Cloud mode skips hardware router registration, hides hardware admin tabs, filters out speech/gsm permissions.

Modular Infrastructure (`modules/`)

Foundation layer for modular decomposition (issue #489). Phase 4 complete: all 28 routers migrated (Phase 3), all inline endpoints extracted (Phase 4.1–4.5), all background tasks via TaskRegistry (Phase 4.6), all startup helpers + service init extracted to domain startup.py modules, global service variables removed (Phase 4.7a/b). Phase 5.1–5.4 complete (EventBus infrastructure, first events, DatasetSynced, Widget→CRM events). Phase 6 complete (Protocol interfaces for AuthService, KnowledgeService, LLMService, ChatService in modules/*/protocols.py). Phase 7.1 complete (AuthService facade in modules/core/auth_service.py). Phase 7.2 complete (KnowledgeService facade in modules/knowledge/facade.py). Phase 7.3 complete (LLMService facade in modules/llm/facade.py). Phase 7.4 complete (ChatService facade in modules/chat/facade.py).

EventBus (modules/core/events.py): In-process async pub/sub. Handlers run concurrently via asyncio.gather; exceptions are logged, never propagated to publisher. BaseEvent dataclass with auto-timestamp. Singleton in ServiceContainer.event_bus. Domain events: InternetStatusChanged, UserRoleChanged, SessionRevoked, DatasetSynced (in modules/core/events.py), KnowledgeUpdated (in modules/knowledge/events.py), WidgetSessionCreated, WidgetMessageSent, WidgetContactSubmitted (in modules/channels/widget/events.py). Subscriptions registered via setup_event_subscriptions() in modules/core/startup.py, which delegates to domain-specific setup functions (setup_llm_event_subscriptions() in modules/llm/startup.py, setup_knowledge_event_subscriptions() in modules/knowledge/startup.py, setup_crm_event_subscriptions() in modules/crm/startup.py). DatasetSynced decouples CRM/ecommerce/kanban from knowledge. Widget events decouple widget router from amoCRM: widget publishes events, CRM domain handles lead/contact/note creation reactively.
TaskRegistry (modules/core/tasks.py): Named background tasks — periodic (interval-based) or one-shot. start_all() / cancel_all(timeout) lifecycle. TaskInfo dataclass tracks status, run count, last error. 7 tasks registered in startup_event(): session-cleanup (1h), periodic-vacuum (7d), kanban-sync (15min), woocommerce-sync (daily 23:00 UTC), rss-sync (1h), wiki-embeddings (one-shot), wiki-collection-indexes (one-shot). Task functions in modules/core/maintenance.py, modules/knowledge/tasks.py, modules/knowledge/rss_service.py, modules/kanban/tasks.py, modules/ecommerce/tasks.py.
HealthRegistry (modules/core/health.py): Modular health checks with per-check timeout (asyncio.wait_for). Status aggregation: all ok → ok, any degraded → degraded, any error → error.
InternetMonitor (modules/core/internet_monitor.py): Periodic connectivity checker (ping DNS/Cloudflare). Auto-switches LLM backend: online → cloud provider (claude_bridge priority), offline → local vLLM. Publishes InternetStatusChanged events via EventBus (container.event_bus). Configurable thresholds, 30s default interval. Status endpoint: GET /admin/gsm/internet-status. Health check includes internet section.

Import from modules.core: EventBus, BaseEvent, TaskRegistry, TaskInfo, HealthRegistry, HealthStatus, UserRoleChanged, SessionRevoked, DatasetSynced.

Domain Services (`modules/*/service.py`)

32 service classes extracted from the former monolithic db/integration.py into 16 domain files (Phase 2, issue #492):

Module	File	Service Classes
`modules/core/`	`service.py`	`DatabaseService`, `UserService`, `UserSessionService`, `RoleService`, `WorkspaceService`, `ConfigService`, `UserIdentityService`
`modules/core/`	`auth_service.py`	`AuthService` (Phase 7.1 facade — wraps `auth_manager.py` functions, implements Protocol from `protocols.py`)
`modules/knowledge/`	`facade.py`	`KnowledgeServiceImpl` (Phase 7.2 facade — wraps `wiki_rag_service` + knowledge services, implements Protocol from `protocols.py`)
`modules/llm/`	`facade.py`	`LLMServiceImpl` (Phase 7.3 facade — wraps `CloudLLMService`/`VLLMLLMService` + `CloudProviderService`, implements Protocol from `protocols.py`)
`modules/chat/`	`facade.py`	`ChatServiceImpl` (Phase 7.4 facade — wraps `ChatService` CRUD + LLM generation + `ChatShareService`, implements Protocol from `protocols.py`)
`modules/chat/`	`service.py`	`ChatService`, `ChatShareService`
`modules/knowledge/`	`service.py`	`FAQService`, `KnowledgeDocService`, `KnowledgeCollectionService`, `GitHubRepoProjectService`
`modules/channels/telegram/`	`service.py`	`BotInstanceService`, `TelegramSessionService`
`modules/channels/whatsapp/`	`service.py`	`WhatsAppInstanceService`
`modules/channels/widget/`	`service.py`	`WidgetInstanceService`
`modules/channels/mobile/`	`service.py`	`MobileAppInstanceService`
`modules/kanban/`	`service.py`	`KanbanService`, `KanbanProjectService`
`modules/claude_code/`	`service.py`	`ClaudeCodeService`, `ClaudeCodeProjectService`
`modules/llm/`	`service.py`	`CloudProviderService`
`modules/monitoring/`	`service.py`	`AuditService`, `PaymentService`
`modules/admin/`	`service.py`	`ResourceShareService`
`modules/speech/`	`service.py`, `streaming.py`	`PresetService`, `StreamingTTSManager`
`modules/crm/`	`service.py`	`AmoCRMService`
`modules/ecommerce/`	`service.py`	`WooCommerceService`
`modules/telephony/`	`service.py`	`GSMService`
`modules/google/`	`service.py`, `models.py`	`GoogleOAuthService`

Import pattern: from modules.chat.service import chat_service (direct, preferred) or from db.integration import async_chat_manager (backward-compatible alias). Domain __init__.py files do NOT re-export services (see Known Issues #9).

Domain Routers (`modules/*/router.py`)

Phase 3 migration complete: all 28 routers moved from app/routers/ to domain modules. Original files are 1-3 line facade re-exports.

Domain	Router file	Facade
`modules/ecommerce/`	`router.py`	`app/routers/woocommerce.py`
`modules/crm/`	`router.py`	`app/routers/amocrm.py` (exports `router` + `webhook_router`)
`modules/telephony/`	`router.py`	`app/routers/gsm.py`
`modules/speech/`	`router_tts.py`, `router_stt.py`, `router_services.py`	`app/routers/tts.py`, `stt.py`, `services.py`
`modules/knowledge/`	`router_faq.py`, `router_wiki_rag.py`, `router_github_repos.py`	`app/routers/faq.py`, `wiki_rag.py`, `github_repos.py`
`modules/kanban/`	`router.py`	`app/routers/kanban.py`
`modules/claude_code/`	`router.py`	`app/routers/claude_code.py`
`modules/channels/telegram/`	`router.py`	`app/routers/telegram.py`
`modules/channels/whatsapp/`	`router.py`	`app/routers/whatsapp.py`
`modules/channels/widget/`	`router.py`, `router_public.py`	`app/routers/widget.py` (admin); public endpoints direct
`modules/channels/mobile/`	`router.py`	`app/routers/mobile.py`
`modules/sales/`	`router_bot_sales.py`, `router_yoomoney.py`	`app/routers/bot_sales.py`, `yoomoney_webhook.py`
`modules/core/`	`router_auth.py`, `router_roles.py`, `router_workspace.py`	`app/routers/auth.py`, `roles.py`, `workspace.py`
`modules/admin/`	`router_backup.py`, `router_legal.py`, `router_github_webhook.py`	`app/routers/backup.py`, `legal.py`, `github_webhook.py`
`modules/monitoring/`	`router_audit.py`, `router_usage.py`, `router_monitor.py`	`app/routers/audit.py`, `usage.py`, `monitor.py`
`modules/chat/`	`router.py`	`app/routers/chat.py`
`modules/llm/`	`router.py`	`app/routers/llm.py`
`modules/google/`	`router.py` (+ `callback_router`)	`app/routers/google.py`
`modules/knowledge/`	`router_google_drive.py`	Google Drive RAG sync (`/admin/google-drive/*`)

Phase 4 routers (extracted from orchestrator.py, not from app/routers/):

Domain	Router file	Endpoints	Phase
`modules/compat/`	`router.py`	Legacy telephony (`/tts`, `/stt`, `/chat`, `/process_call`, `/reset_conversation`) + OpenAI-compat (`/v1/*`)	4.3
`modules/core/`	`router_health.py`	`/`, `/health`, `/admin/deployment-mode`	4.3
`modules/llm/`	`router_finetune.py`	LLM finetune: dataset, training, LoRA adapters (`/admin/finetune/*`)	4.4
`modules/speech/`	`router_finetune.py`	TTS finetune: samples, training, models (`/admin/tts-finetune/*`)	4.4
`modules/speech/`	`router_voices.py`	Voice selection + test (`/admin/voices`, `/admin/voice`, `/admin/voice/test`)	4.5
`modules/llm/`	`router_models.py`	HuggingFace model management (`/admin/models/*`)	4.5
`modules/monitoring/`	`router_logs.py`	Log viewing + streaming (`/admin/logs/*`)	4.5

New routers import domain services directly (from modules.monitoring.service import audit_service) instead of through the facade. GPU-only routers (router_voices.py, router_models.py, router_finetune.py) are conditionally registered when DEPLOYMENT_MODE != "cloud".

Key Components

orchestrator.py (~320 lines): FastAPI entry point — pure wiring, zero domain logic. No inline endpoints (Phase 4.1–4.5), no raw asyncio.create_task() (Phase 4.6), no helper functions (Phase 4.7a), no global service variables (Phase 4.7b). Contains only: imports, CORS/middleware, declarative router registration (~28 routers), startup_event() (calls domain init functions + registers tasks), shutdown_event() (delegates to graceful_shutdown()), static file serving, Vite dev proxy. All service initialization in domain startup.py modules: modules/speech/startup.py (TTS/STT), modules/llm/startup.py (LLM + fallback chain + InternetMonitor callback), modules/knowledge/startup.py (Wiki RAG + embeddings), modules/core/startup.py (seed, monitor, shutdown), modules/telephony/startup.py (GSM), modules/channels/{telegram,whatsapp}/startup.py (bot auto-start).

ServiceContainer (app/dependencies.py): Singleton holding references to all initialized services — the single source of truth for service state (no global variables). Includes event_bus: EventBus singleton for inter-module events. Routers get services via FastAPI Depends. Populated during app startup by domain init_*() functions. Runtime mutations (LLM backend switch) write directly to container.

Two service layers: Core AI services at project root (cloud_llm_service.py, vllm_llm_service.py, voice_clone_service.py, stt_service.py, etc.). Domain services in app/services/ (amocrm_service.py, wiki_rag_service.py, backup_service.py, sales_funnel.py, etc.).

Database layer (db/): Async SQLAlchemy + aiosqlite. db/database.py creates engine. db/integration.py is a ~100-line facade that imports singletons and class aliases from domain services (from modules.chat.service import chat_service as async_chat_manager). Singletons are created in modules/*/service.py; the facade only re-exports them under old names. Repositories in db/repositories/ inherit from BaseRepository with generic CRUD and _apply_workspace_filter() for multi-tenant queries.

Unit of Work: Repositories only flush() — never commit(). Callers own transaction boundaries: service methods call session.commit(), get_async_session() auto-commits on success / rollbacks on exception.

SQLITE_BUSY retry: db/retry.py @retry_on_busy() — exponential backoff (3 retries, 0.1s base). Applied to write methods in domain service classes (16 methods across 5 services).

Telegram bots: Subprocesses managed by multi_bot_manager.py. Config pre-fetched from DB, written to /tmp/bot_config_{id}.json. Two frameworks: python-telegram-bot (legacy) + aiogram (new). LLMRouter in telegram_bot/services/llm_router.py routes through orchestrator chat API. File uploads: telegram_bot/services/file_extractor.py extracts text from documents (text files + PDF via pdfplumber), injected into chat as plain text.

WhatsApp bots: Same subprocess pattern via whatsapp_manager.py. Module: whatsapp_bot/ (runs as python -m whatsapp_bot).

Platform agent fallback (prompts/platform-agent.md): When a chat session has no system_prompt set, modules/chat/facade.py loads this file as the system prompt (lazy, cached per-process) before falling back to llm.get_system_prompt(). Persona helps end-users configure their own assistants; no admin/ops content. Override path via PLATFORM_AGENT_PROMPT_FILE.

Role-specific prompt templates (prompts/lawyer-ru.md, lawyer-kz.md, accountant-kz.md, README-roles.md): hand-written system prompts for widget/bot instances tied to the static legal/accountancy collections. Not loaded automatically — admin pastes content into the instance's system_prompt field. Each enforces: cite article + code, warn about редакция drift, refuse aiding crime, no legal/tax opinions. README maps slug → role → which collections to attach.

Bridge HOME isolation (services/bridge/src/providers/claude/provider.py): The Claude CLI subprocess spawned by the bridge inherits $HOME from the service, so it picks up /root/CLAUDE.md and user-memory files — which historically leaked admin context into end-user chats. Set BRIDGE_ISOLATE_HOME=1 in .env to spawn the CLI with HOME=/var/lib/ai-secretary-bridge and cwd=<home>/sandbox (outside /root), with credentials symlinked from the real ~/.claude/. Default off to avoid breaking dev setups without the isolated dir.

Cloud LLM: cloud_llm_service.py factory pattern. OpenAI-compatible providers auto-handled via OpenAICompatibleProvider. Custom SDKs get their own provider class inheriting BaseLLMProvider. Provider types in PROVIDER_TYPES dict in db/models.py. Supports model fallback via fallback_models list. supports_tools flag + generate_with_tools() on OpenAICompatibleProvider and VLLMLLMService for tool-calling (agentic RAG).

Wiki RAG: app/services/wiki_rag_service.py — tiered search: (1) semantic embeddings (Gemini/OpenAI/local), (2) BM25 with Russian/English stemming, (3) Vector Search microservice (if VECTOR_SEARCH_URL configured). Multi-collection support. Per-instance RAG config on bots/widgets. Agentic RAG (modules/chat/router.py): server-side loop where LLM calls knowledge_search tool to query the knowledge base on demand (max 5 iterations). Providers without supports_tools (Gemini SDK) fall back to one-shot RAG injection. Frontend shows inline search indicator via tool_start/tool_end SSE events. Vector Search (services/vector-search/): standalone FastAPI microservice using ChromaDB + paraphrase-multilingual-mpnet-base-v2 (768 dims). Client: app/services/vector_search_client.py (async httpx). Runs as Docker profile vector-search on port 8003. Async search methods (search_async, retrieve_async, retrieve_multi_async) run all engines in parallel via asyncio.gather and merge/deduplicate results. Background task vector-search-sync upserts all sections on startup. DatasetSynced event triggers incremental sync. Admin endpoints: GET /admin/wiki-rag/vector-search/status, POST /admin/wiki-rag/vector-search/sync.

RSS knowledge layer (modules/knowledge/rss_service.py, models RSSFeed/RSSFeedItem in modules/knowledge/models.py): each RSSFeed row maps a URL to a KnowledgeCollection. Periodic task rss-sync (1h interval) calls feedparser with cached ETag/Last-Modified, dedupes new entries by GUID, optionally fetches full article HTML and converts it to markdown via lxml (chrome stripped: nav/footer/scripts/share/cookie banners), writes one md file per item to the collection's wiki-pages/ dir, creates KnowledgeDocument rows, then calls wiki_rag.reload_collection() + sync_collection_to_vector_search() directly (skips DatasetSynced event because its handler deletes-and-recreates docs which would orphan rss_feed_items.document_id FKs). Per-feed flags: fetch_full_text, verify_ssl (needed for adilet.zan.kz with non-standard CA). Caps: MAX_ITEMS_PER_FEED=50, MAX_FULL_TEXT_BYTES=1.5MB. Admin CRUD: GET/POST/PATCH/DELETE /admin/rss/feeds, POST /admin/rss/feeds/{id}/sync, POST /admin/rss/sync-all. Admin UI lives in fine-tune Collections section. Seed script scripts/seed_rss_feeds.py provisions 3 news collections (ru-bukh-news, ru-pravo-news, kz-news) with 13 verified RU/KZ accountancy & legal feeds.

Per-user Claude token tracking (shared $100 Anthropic plan): usage_log table has nullable user_id (migration 0024_usage_log_user_id) keyed to chat session owner. OpenAICompatibleProvider.last_usage populated from response usage in non-stream path, from final-chunk usage or tiktoken estimate in stream path; CloudLLMService.last_usage proxies to provider. modules/chat/facade.py:_log_llm_usage writes one row per Claude/claude_bridge response (best-effort, never raises) with service_type=llm, units_consumed=input+output, details={input_tokens, output_tokens, model, estimated}. Period bounds in modules/monitoring/period.py — anchor day-of-month (default 30, capped to last day for short months). Endpoints: GET /admin/usage/me (any auth, own total), GET /admin/usage/by-user (admin, all users sorted desc by tokens). Mobile shows a thin orange→red bar under the context indicator with the user's own period total (mobile/src/views/ChatView.vue); admin Users view has a "Токены" column (admin/src/views/UsersView.vue). Only Claude is tracked — Gemini/OpenAI/vLLM are skipped by _is_claude_provider. Streaming numbers are tiktoken-estimated (the bridge doesn't currently emit usage chunks); non-streaming responses use real Anthropic numbers.

Static legal & accountancy collections (scripts/scrape_digitax/): scraping pipeline for fixed corpora (federal codes, tax authority pages, professional bodies). Three steps: scrape.py --site <slug> (BFS crawler, BFS link-extractors per site), parse.py --site <slug> (HTML→Markdown via lxml + per-site CONTENT_SELECTORS), upload.py --site <slug> (writes DB rows + copies to wiki-pages/<slug>/). Site catalog in config.py:SITES — 7 Irish accountancy sites (digitax) + 3 RU bookkeeping (USN) + 10 RU federal codes already scraped (consultant.ru) + 38 additional RU consultant.ru sources generated from RU_FEDERAL_LAWS list (Constitution, 11 more codes — НК ч.1/2, БК, ГПК, АПК, КАС, ЗК, ЛК, ВК, ГсК; 24 ФЗ across corporate / administrative / social / finance / info domains; 11 ФКЗ; only ru-fz-273 actually scraped so far) + 7 KZ codes (adilet.zan.kz) + 2 KZ accountancy practical (kgd.gov.kz, mybuh.kz; configured but not scraped). consultant.ru codes: each cons_doc_LAW_<id> is a BFS root with article-level URLs branching out; sidebar pollution stripped via div.seo-links removal in parse.py:strip_boilerplate. adilet.zan.kz codes: each Kazakh code is a single monolithic page (~1.5–5 MB), needs verify_ssl: False (Kazakh root CA not in certifi); content selector div.container_gamma.text.text_upd. Critical: global POST /admin/wiki-rag/reload only re-indexes the legacy WIKI_DIR (root-level files), NOT collections. After bulk upload, loop POST /admin/wiki-rag/collections/{id}/reload per collection. Server-side runs of upload.py skip self-copy when source dir is wiki-pages/ (parsed/ absent on production).

Frontend Architecture

Stack: Vue 3 + Composition API + TypeScript, Vite, Pinia (persisted), Vue Router (hash history), TanStack Vue Query, vue-i18n (ru/en/kk), TailwindCSS + radix-vue, lucide-vue-next. Path alias @ → admin/src/.

Routing (admin/src/router.ts): createWebHashHistory. Routes use meta fields: public (bypass auth), localOnly (hidden in cloud mode), module (RBAC module name), minLevel (view/edit/manage).

Stores (admin/src/stores/): Key store auth.ts holds JWT, user, deploymentMode, permissions. Exposes isAdmin, isCloudMode, hasModule(), canView(), canEdit(), canManage(). Toast API: toast.success/error/warning/info(title) — do NOT use toast.show() (different signature: show(type, title, message?)). Confirm API: confirm.confirm({ title, message, confirmText, type }) returns Promise<boolean> — no ask() method.

API layer (admin/src/api/): client.ts provides api.get/post/put/delete/upload + createSSE() (auto-injects JWT). Domain files build on it. All re-exported from api/index.ts.

Demo mode: VITE_DEMO_MODE=true monkey-patches window.fetch to intercept API calls with mock data from 23 domain files in admin/src/api/demo/.

Product variant (VITE_PRODUCT_VARIANT env var, defaults to full): Set to lite to ship a stripped admin panel. Lite variant whitelists /chat, /llm, /wiki, /finetune (collections CRUD), /widget, /telegram, /whatsapp, /mobile-app, /settings, /users, /about, /login, /invite/* — everything else is blocked by the router guard and hidden from nav. Scripts: npm run dev:lite and npm run build:lite. Central helper: admin/src/config/productVariant.ts (IS_LITE, isPathAllowed()).

Vite base path: Production /admin/ (served by FastAPI). Demo/standalone: / (via VITE_BASE_PATH or .env.production.local).

Mobile App (`mobile/`)

Stack: Vue 3 + TypeScript, Vite, Pinia, Vue Router, Capacitor (Android), TailwindCSS 4. Path alias @ → mobile/src/.

Purpose: Standalone Android chat app connecting to https://ai-sekretar24.ru (hardcoded). Role-based experience: admins get full chat controls, non-admins see only shared chats.

Screens: LoginView (auth only, no server URL), ChatListView (admin: session list + FAB + delete; non-admin: Claude-like welcome + shared chat cards), ChatView (streaming chat + TTS + role-based controls), SettingsView (account + logout).

Theme: Night-eyes (warm brown/amber/gold), hardcoded — no theme switching. Background #1a1308, text #d9c9a8, primary amber-600, cards stone-800.

Role-based access:

Admin (role=admin): full chat controls — LLM provider selector, RAG collection multi-select, system prompt editing, export (copy/md/json), branching, context files, all message actions (edit, regenerate, summarize, delete branch), session creation/deletion. Admin-only controls live in admin panel only (not mobile).
Non-admin: shared chats (is_shared_with_me filter), Claude-like welcome, basic message actions (TTS + copy), branching, context files, web search toggle. No LLM/RAG selectors, no export, no session deletion.

Mobile Instances: Admin creates MobileAppInstance (LLM backend, persona, system prompt, TTS, RAG) in admin panel (/mobile-app view). Users are assigned to instances via ResourceShare. On login, mobile app fetches GET /admin/mobile/my-config to get assigned instance config. Chat sessions use source="mobile" + mobile_instance_id for per-instance LLM/prompt routing.

API layer (mobile/src/api/): client.ts (base fetch + upload for multipart FormData), chat.ts (sessions/streaming/branches/uploadImage), admin.ts (admin-only APIs, used only by admin panel).

File upload in chat: Same backend as admin (POST /admin/chat/sessions/{id}/upload-image). ChatInput.vue has paperclip button between input and send. Files uploaded → image_ids passed to streamMessage → backend injects extracted text (OCR/PDF/DOCX/XLSX) into LLM context. Accepts: JPEG, PNG, WebP, GIF, PDF, XLSX, DOCX, TXT, CSV, MD, JSON, XML, HTML, YAML. Max 10MB.

Per-session named system prompts (ChatSessionPrompt model, table chat_session_prompts): each chat session can hold multiple named prompts; exactly one is active. Endpoints GET/POST /admin/chat/sessions/{id}/prompts, PATCH /admin/chat/sessions/{id}/prompts/{pid}, POST /admin/chat/sessions/{id}/prompts/{pid}/activate, DELETE …/{pid}. The active prompt's content is mirrored into ChatSession.system_prompt, so the existing streaming pipeline picks it up unchanged — switching prompt swaps the assistant's role while preserving conversation history. Creating the first prompt while the session already has a system_prompt preserves it as the initial content. Deleting the active prompt promotes the most recent remaining one.

Key differences from admin panel:

Hardcoded server URL (https://ai-sekretar24.ru), no user configuration
JWT stored in native Preferences (not localStorage)
No demo mode, no full RBAC UI — role-based chat experience
~77KB gzipped (vs ~2MB admin)

Build: cd mobile && npm run build && npx cap sync android. APK via Android Studio: npx cap open android → Build → Build APK.

No lint/format/test — mobile app has only dev, build, preview scripts. Type checking happens during npm run build via vue-tsc -b.

Code Patterns

Adding a new API endpoint:

Create/edit router in modules/{domain}/router.py (preferred) or app/routers/ (legacy)
Use domain service singletons (from modules.chat.service import chat_service) for DB access
If using app/routers/, add to __all__ in app/routers/__init__.py
Register in orchestrator.py with app.include_router()

Adding a new cloud LLM provider type:

Add entry to PROVIDER_TYPES dict in db/models.py
OpenAI-compatible → works automatically via OpenAICompatibleProvider
Custom SDK → create provider class inheriting BaseLLMProvider in cloud_llm_service.py, register in CloudLLMService.PROVIDER_CLASSES

RBAC auth guards (in auth_manager.py):

Depends(require_permission(module, level)) — checks module permission
user_has_level(user, module, level) — inline check within endpoint
workspace_context(user, module) → (owner_id, workspace_id) — for repository calls; owner_id=None means "shared within workspace"
Data isolation: always pass both owner_id and workspace_id to repository/manager

Gate-check pattern for mutations: UPDATE/DELETE endpoints call workspace-filtered get first (e.g., get_by_id_ws(id, workspace_id=ws_id)); if None → 404. Prevents cross-workspace access. ChatRepository is the reference implementation.

Adding i18n translations: Edit admin/src/plugins/i18n.ts — add keys to all three message objects: ru, en, kk.

API URL patterns:

GET/POST /admin/{resource} — List/create
GET/PUT/DELETE /admin/{resource}/{id} — CRUD
POST /admin/{resource}/{id}/action — Actions (start, stop, test)
POST /webhooks/{service} — External webhooks
POST /v1/chat/completions, GET /v1/models — OpenAI-compatible

Codebase Conventions

Python 3.11+, line length 100, double quotes (ruff format)
Cyrillic is normal — RUF001/002/003 disabled; Russian in UI text, logging, persona prompts
FastAPI Depends — B008 disabled for Depends() in default args
Optional imports — Services like vLLM and OpenVoice use try/except at module level with *_AVAILABLE flags
SQLAlchemy 2.0 style — Mapped[T] with mapped_column() (declarative 2.0)
Repository pattern — BaseRepository(Generic[T]) provides CRUD + _apply_workspace_filter(). Repos only flush(), never commit()
mypy strict only for db/, auth_manager.py, service_manager.py; other modules relaxed. mypy is soft in CI
Pre-commit hooks — ruff, mypy (core only), eslint, hadolint, standard checks (see .pre-commit-config.yaml)

Key Environment Variables

LLM_BACKEND=vllm                    # "vllm" or "cloud:{provider_id}" (legacy "gemini" auto-migrates)
VLLM_API_URL=http://localhost:11434 # Auto-normalized: trailing /v1 stripped
DEPLOYMENT_MODE=full                # "full", "cloud", or "local"
ORCHESTRATOR_PORT=8002
ADMIN_JWT_SECRET=...                # Auto-generated if empty
REDIS_URL=redis://localhost:6379/0  # Optional, graceful fallback
DEV_MODE=1                          # Backend proxies to Vite dev server (:5173)
VECTOR_SEARCH_URL=http://localhost:8003  # Optional, Vector Search microservice
VECTOR_SEARCH_TOKEN=                # Bearer token for Vector Search API
GOOGLE_CLIENT_ID=                   # Google OAuth 2.0 (Drive, Docs, Sheets, Gmail)
GOOGLE_CLIENT_SECRET=               # Google OAuth 2.0 client secret
GOOGLE_REDIRECT_URI=                # OAuth callback URL (default: {BASE_URL}/admin/oauth/google/callback)
PLATFORM_AGENT_PROMPT_FILE=         # Override path to platform-agent fallback prompt (default: /opt/ai-secretary/prompts/platform-agent.md)
BRIDGE_ISOLATE_HOME=                # "1" to spawn Claude CLI with isolated HOME so host's CLAUDE.md/memory files don't leak into user chats
BRIDGE_ISOLATED_HOME=               # Override isolated HOME path (default: /var/lib/ai-secretary-bridge)

Deployment

Server Deployment (Production)

Server: root@155.212.231.7, systemd service (not Docker Compose).

ssh root@155.212.231.7
cd /opt/ai-secretary
git pull origin main
cd admin && npm ci && npm run build
rsync -av --delete admin/dist/ /var/www/admin-ai-sekretar24/  # REQUIRED: nginx serves from /var/www/
sed -i "s/ai-admin-v[0-9a-z]*/ai-admin-v$(date +%s)/" /var/www/admin-ai-sekretar24/sw.js  # bust SW cache
systemctl restart ai-secretary               # restart orchestrator
curl -s http://localhost:8002/health         # health check

IMPORTANT: Nginx serves frontend from /var/www/admin-ai-sekretar24/, NOT from /opt/ai-secretary/admin/dist/. Always rsync after build.

Webhook auto-deploy: ai-secretary-webhook.service triggers on GitHub push.

Local-only files (not in git): .env, docker-compose.override.yml, modified Dockerfile, services/bridge/src/models/

Deployment Checklist

Run lint locally — ruff check . && cd admin && npm run lint:check
Check for pending DB migrations
Kill stale processes — lsof -i :8002
Clean build artifacts — rm -rf admin/dist admin/node_modules/.vite
Build — npm run build (verify VITE_DEMO_MODE is NOT set)
Restart — docker compose restart ai-secretary
Verify — curl http://localhost:8002/health + test /admin/auth/login

Automated Deployment

./deploy.sh                # git pull, re-apply patches, build admin, restart orchestrator
./test_system.sh           # Quick health checks and API smoke tests

Demo Sites

Fully offline demo builds at demo.ai-sekretar24.ru:

Full demo (/full/): npm run build -- --mode demo (admin role, all features)
Cloud demo (/cloud/): npm run build -- --mode demo-web (web role, customer-facing)
Deploy: bash /root/deploy-demo.sh

Debugging Principles

Check in this order — infrastructure first, application logic last:

Build artifacts — correct build deployed? Stale demo interceptors?
Deploy pipeline — stale Vite cache, wrong .env, VITE_DEMO_MODE leaking?
DB state — migrations applied? sqlite3 data/secretary.db ".tables" / .schema
Process state — port conflicts (lsof -i :8002), zombie processes?
Auth/JWT — ADMIN_JWT_SECRET auto-generated on restart (invalidates tokens). Sessions validated against user_sessions table via SessionCache
Application logic — only after ruling out 1–5

Parallel Development (Two Claude Code Instances)

This project is developed from two machines:

local — dev workstation with GPU (RTX 3060), full stack
server — Beget VPS (root@155.212.231.7), systemd service, cloud LLM only

Each machine identifies itself via ~/.claude/projects/.../memory/MEMORY.md (## Machine Role section). Check your machine role before git operations.

Git Workflow

Never push directly to main — always feature branch + PR
Branch prefixes: local/* (dev machine), server/* (server), or feat//fix//docs/ with machine suffix
Always git pull before starting work
Do not amend or force-push commits made by the other instance

File Ownership

Local primary: Hardware services (voice_clone_service.py, stt_service.py, vllm_llm_service.py, piper_tts_service.py), GPU/hardware routers, fine-tuning, voice samples, start scripts.

Server primary: Cloud services (cloud_llm_service.py, xray_proxy_manager.py), Docker files, bot operations (runtime), production data.

Shared (coordinate via branches): orchestrator.py, app/routers/, db/, admin/, CLAUDE.md, migrations (create new files only).

Known Issues

Vosk model required — Download to models/vosk/ for STT
XTTS requires CUDA CC >= 7.0 — RTX 3060+; use OpenVoice for older GPUs
GPU memory — vLLM ~6GB + XTTS ~5GB must fit in 12GB
VLESS proxy vs localhost — GeminiProvider sets global HTTP_PROXY; OpenAICompatibleProvider sets NO_PROXY=127.0.0.1,localhost for claude_bridge; bridge_manager.py strips proxy env vars
Claude bridge timeouts — 7-30s warmup. read=300s timeout for bridge (vs 60s default). max_tokens=4096 (vs 512)
services/bridge/src/models/ gitignored — .gitignore pattern models/ catches it. Copy manually after clone
Docker CPU: whisper excluded — openai-whisper fails to build (missing pkg_resources). Server Dockerfile patched to grep -v whisper
Docker + Claude CLI — CPU image needs Node.js. Server Dockerfile patched to install Node.js 20 + @anthropic-ai/claude-code
Circular import in domain __init__.py — Domain __init__.py files MUST stay empty (no service re-exports). Chain: db/models.py imports from modules.X.models import ... → Python executes modules/X/__init__.py → if it imports service.py → service.py imports db.repositories → db.repositories imports db.models → circular. Workaround: import services directly (from modules.chat.service import ChatService). Future fix (Phase 3+): eliminate eager imports in db/models.py by making it a lazy facade or removing it entirely once consumers import models from domain modules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Commands

Build & Run

Admin Panel

Mobile App

User Management

Mobile Internet (SIM7600E-H QMI)

Database Migrations

Lint & Format

Testing

CI

Architecture

Modular Infrastructure (`modules/`)

Domain Services (`modules/*/service.py`)

Domain Routers (`modules/*/router.py`)

Key Components

Frontend Architecture

Mobile App (`mobile/`)

Code Patterns

Codebase Conventions

Key Environment Variables

Deployment

Server Deployment (Production)

Deployment Checklist

Automated Deployment

Demo Sites

Debugging Principles

Parallel Development (Two Claude Code Instances)

Git Workflow

File Ownership

Known Issues

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Commands

Build & Run

Admin Panel

Mobile App

User Management

Mobile Internet (SIM7600E-H QMI)

Database Migrations

Lint & Format

Testing

CI

Architecture

Modular Infrastructure (modules/)

Domain Services (modules/*/service.py)

Domain Routers (modules/*/router.py)

Key Components

Frontend Architecture

Mobile App (mobile/)

Code Patterns

Codebase Conventions

Key Environment Variables

Deployment

Server Deployment (Production)

Deployment Checklist

Automated Deployment

Demo Sites

Debugging Principles

Parallel Development (Two Claude Code Instances)

Git Workflow

File Ownership

Known Issues

Modular Infrastructure (`modules/`)

Domain Services (`modules/*/service.py`)

Domain Routers (`modules/*/router.py`)

Mobile App (`mobile/`)