Skip to content

Latest commit

 

History

History
462 lines (337 loc) · 40.7 KB

File metadata and controls

462 lines (337 loc) · 40.7 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

AI Secretary System — virtual secretary with voice cloning (XTTS v2, OpenVoice), pre-trained voices (Piper), local LLM (vLLM + Qwen/Llama/DeepSeek), cloud LLM fallback (Gemini, Kimi, OpenAI, Claude, DeepSeek, OpenRouter), and Claude Code CLI bridge. Features GSM telephony (SIM7600E-H), amoCRM integration, Vue 3 PWA admin panel, i18n (ru/en/kk), multi-instance Telegram bots with sales/payments, multi-instance WhatsApp bots (Cloud API), website chat widgets, and LoRA fine-tuning.

Commands

Build & Run

# Docker (recommended)
cp .env.docker.example .env && docker compose up -d          # GPU mode
docker compose -f docker-compose.yml -f docker-compose.cpu.yml up -d  # CPU mode
docker compose -f docker-compose.yml -f docker-compose.full.yml up -d # Full containerized (includes vLLM)
docker compose --profile vector-search up -d                 # + Vector Search microservice (:8003)

# Local
./start_gpu.sh              # GPU: XTTS + Qwen2.5-7B + LoRA
./start_cpu.sh              # CPU: Piper + Gemini API
curl http://localhost:8002/health

Admin Panel

cd admin && npm install     # First-time setup
cd admin && npm run build   # Production build (vue-tsc type-check + vite build)
cd admin && npm run dev     # Dev server (:5173), proxies /admin + /v1 + /health to :8002
DEV_MODE=1 ./start_gpu.sh   # Backend proxies to Vite dev server

Default login: admin / admin. Guest demo: demo / demo (read-only).

No frontend testsnpm test is not configured. Type checking happens during npm run build via vue-tsc -b.

Deploy gotcha: Vite deletes and recreates admin/dist/ (new inode), breaking Docker bind mounts. Always docker compose restart after npm run build.

Mobile App

cd mobile && npm install     # First-time setup
cd mobile && npm run build   # Production build (vue-tsc type-check + vite build)
cd mobile && npm run dev     # Dev server
cd mobile && npx cap sync android  # Sync web assets to Android project
cd mobile && npx cap open android  # Open in Android Studio → Build APK

User Management

python scripts/manage_users.py list                          # List all users
python scripts/manage_users.py create <user> <pass> --role user  # Roles: admin|user|web|guest
python scripts/manage_users.py set-password <user> <pass>    # Reset password
python scripts/manage_users.py set-role <user> <role>        # Change role
python scripts/manage_users.py disable <user>                # Deactivate

Mobile Internet (SIM7600E-H QMI)

sudo bash scripts/setup_mobile_internet.sh start   # Connect wwan0 via QMI
sudo bash scripts/setup_mobile_internet.sh stop    # Disconnect
sudo bash scripts/setup_mobile_internet.sh status  # Check connection
sudo bash scripts/mobile-internet-monitor.sh       # Daemon: auto-reconnect + VPN route failover
# systemd: scripts/mobile-internet.service          # Install as persistent service

Database Migrations

Three migration systems:

  • Alembic (preferred) — for schema changes (ALTER TABLE, new tables)
  • scripts/migrate_*.py — for data migrations. New scripts must use scripts/_migration_template.py (transaction-safe)
  • Base.metadata.create_all — auto-creates missing tables on startup (does not alter existing)
alembic upgrade head                        # Apply all pending migrations
alembic revision --autogenerate -m "desc"   # Generate migration from model changes
cp scripts/_migration_template.py scripts/migrate_<name>.py  # New data migration

Lint & Format

# Python
ruff check .                # Lint (see pyproject.toml for rules)
ruff check . --fix          # Auto-fix
ruff format .               # Format
ruff format --check .       # Check formatting

# Frontend
cd admin && npm run lint         # Lint + auto-fix
cd admin && npm run lint:check   # Lint only (CI-style)
cd admin && npm run format       # Prettier format
cd admin && npm run format:check # Check formatting

# All at once
pre-commit run --all-files

Testing

pytest tests/                          # All tests
pytest tests/unit/test_retry_on_busy.py -v  # Single file
pytest -k "test_chat" -v               # By name pattern
pytest -m "not slow" -v                # Exclude slow tests
pytest -m "not integration" -v         # Exclude integration (needs external services)

asyncio_mode = "auto" — async tests run without @pytest.mark.asyncio. Custom markers: slow, integration, gpu. Docker: docker exec ai-secretary python -m pytest tests/ -v -o asyncio_mode=auto.

CI

GitHub Actions (.github/workflows/ci.yml) on push to main/develop and PRs:

  • lint-backend — ruff check + format check + mypy on orchestrator.py only (mypy soft, || true)
  • lint-frontend — npm ci + eslint + build (type check)
  • security — Trivy vulnerability scanner

Always run lint locally before pushing. Protected branches require PR workflow — never push directly to main.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                  Orchestrator (port 8002)                     │
│  orchestrator.py + modules/*/router*.py (~28 routers)        │
│  ┌────────────────────────────────────────────────────────┐  │
│  │        Vue 3 Admin Panel (24 views, PWA)                │  │
│  │                admin/dist/                              │  │
│  └────────────────────────────────────────────────────────┘  │
└────────────┬──────────────┬──────────────┬───────────────────┘
             │              │              │
     ┌───────┴──┐    ┌──────┴───┐   ┌─────┴─────┐
     │ LLM      │    │ TTS      │   │ STT       │
     │ vLLM /   │    │ XTTS v2 /│   │ Vosk /    │
     │ Cloud    │    │ Piper    │   │ Whisper   │
     └──────────┘    └──────────┘   └───────────┘

Request flow: User message → FAQ check (instant match) OR LLM → TTS → Audio response

Deployment modes (DEPLOYMENT_MODE env var): full (default, everything), cloud (no GPU/TTS/STT/GSM), local (same as full). Cloud mode skips hardware router registration, hides hardware admin tabs, filters out speech/gsm permissions.

Modular Infrastructure (modules/)

Foundation layer for modular decomposition (issue #489). Phase 4 complete: all 28 routers migrated (Phase 3), all inline endpoints extracted (Phase 4.1–4.5), all background tasks via TaskRegistry (Phase 4.6), all startup helpers + service init extracted to domain startup.py modules, global service variables removed (Phase 4.7a/b). Phase 5.1–5.4 complete (EventBus infrastructure, first events, DatasetSynced, Widget→CRM events). Phase 6 complete (Protocol interfaces for AuthService, KnowledgeService, LLMService, ChatService in modules/*/protocols.py). Phase 7.1 complete (AuthService facade in modules/core/auth_service.py). Phase 7.2 complete (KnowledgeService facade in modules/knowledge/facade.py). Phase 7.3 complete (LLMService facade in modules/llm/facade.py). Phase 7.4 complete (ChatService facade in modules/chat/facade.py).

  • EventBus (modules/core/events.py): In-process async pub/sub. Handlers run concurrently via asyncio.gather; exceptions are logged, never propagated to publisher. BaseEvent dataclass with auto-timestamp. Singleton in ServiceContainer.event_bus. Domain events: InternetStatusChanged, UserRoleChanged, SessionRevoked, DatasetSynced (in modules/core/events.py), KnowledgeUpdated (in modules/knowledge/events.py), WidgetSessionCreated, WidgetMessageSent, WidgetContactSubmitted (in modules/channels/widget/events.py). Subscriptions registered via setup_event_subscriptions() in modules/core/startup.py, which delegates to domain-specific setup functions (setup_llm_event_subscriptions() in modules/llm/startup.py, setup_knowledge_event_subscriptions() in modules/knowledge/startup.py, setup_crm_event_subscriptions() in modules/crm/startup.py). DatasetSynced decouples CRM/ecommerce/kanban from knowledge. Widget events decouple widget router from amoCRM: widget publishes events, CRM domain handles lead/contact/note creation reactively.

  • TaskRegistry (modules/core/tasks.py): Named background tasks — periodic (interval-based) or one-shot. start_all() / cancel_all(timeout) lifecycle. TaskInfo dataclass tracks status, run count, last error. 7 tasks registered in startup_event(): session-cleanup (1h), periodic-vacuum (7d), kanban-sync (15min), woocommerce-sync (daily 23:00 UTC), rss-sync (1h), wiki-embeddings (one-shot), wiki-collection-indexes (one-shot). Task functions in modules/core/maintenance.py, modules/knowledge/tasks.py, modules/knowledge/rss_service.py, modules/kanban/tasks.py, modules/ecommerce/tasks.py.

  • HealthRegistry (modules/core/health.py): Modular health checks with per-check timeout (asyncio.wait_for). Status aggregation: all ok → ok, any degraded → degraded, any error → error.

  • InternetMonitor (modules/core/internet_monitor.py): Periodic connectivity checker (ping DNS/Cloudflare). Auto-switches LLM backend: online → cloud provider (claude_bridge priority), offline → local vLLM. Publishes InternetStatusChanged events via EventBus (container.event_bus). Configurable thresholds, 30s default interval. Status endpoint: GET /admin/gsm/internet-status. Health check includes internet section.

Import from modules.core: EventBus, BaseEvent, TaskRegistry, TaskInfo, HealthRegistry, HealthStatus, UserRoleChanged, SessionRevoked, DatasetSynced.

Domain Services (modules/*/service.py)

32 service classes extracted from the former monolithic db/integration.py into 16 domain files (Phase 2, issue #492):

Module File Service Classes
modules/core/ service.py DatabaseService, UserService, UserSessionService, RoleService, WorkspaceService, ConfigService, UserIdentityService
modules/core/ auth_service.py AuthService (Phase 7.1 facade — wraps auth_manager.py functions, implements Protocol from protocols.py)
modules/knowledge/ facade.py KnowledgeServiceImpl (Phase 7.2 facade — wraps wiki_rag_service + knowledge services, implements Protocol from protocols.py)
modules/llm/ facade.py LLMServiceImpl (Phase 7.3 facade — wraps CloudLLMService/VLLMLLMService + CloudProviderService, implements Protocol from protocols.py)
modules/chat/ facade.py ChatServiceImpl (Phase 7.4 facade — wraps ChatService CRUD + LLM generation + ChatShareService, implements Protocol from protocols.py)
modules/chat/ service.py ChatService, ChatShareService
modules/knowledge/ service.py FAQService, KnowledgeDocService, KnowledgeCollectionService, GitHubRepoProjectService
modules/channels/telegram/ service.py BotInstanceService, TelegramSessionService
modules/channels/whatsapp/ service.py WhatsAppInstanceService
modules/channels/widget/ service.py WidgetInstanceService
modules/channels/mobile/ service.py MobileAppInstanceService
modules/kanban/ service.py KanbanService, KanbanProjectService
modules/claude_code/ service.py ClaudeCodeService, ClaudeCodeProjectService
modules/llm/ service.py CloudProviderService
modules/monitoring/ service.py AuditService, PaymentService
modules/admin/ service.py ResourceShareService
modules/speech/ service.py, streaming.py PresetService, StreamingTTSManager
modules/crm/ service.py AmoCRMService
modules/ecommerce/ service.py WooCommerceService
modules/telephony/ service.py GSMService
modules/google/ service.py, models.py GoogleOAuthService

Import pattern: from modules.chat.service import chat_service (direct, preferred) or from db.integration import async_chat_manager (backward-compatible alias). Domain __init__.py files do NOT re-export services (see Known Issues #9).

Domain Routers (modules/*/router.py)

Phase 3 migration complete: all 28 routers moved from app/routers/ to domain modules. Original files are 1-3 line facade re-exports.

Domain Router file Facade
modules/ecommerce/ router.py app/routers/woocommerce.py
modules/crm/ router.py app/routers/amocrm.py (exports router + webhook_router)
modules/telephony/ router.py app/routers/gsm.py
modules/speech/ router_tts.py, router_stt.py, router_services.py app/routers/tts.py, stt.py, services.py
modules/knowledge/ router_faq.py, router_wiki_rag.py, router_github_repos.py app/routers/faq.py, wiki_rag.py, github_repos.py
modules/kanban/ router.py app/routers/kanban.py
modules/claude_code/ router.py app/routers/claude_code.py
modules/channels/telegram/ router.py app/routers/telegram.py
modules/channels/whatsapp/ router.py app/routers/whatsapp.py
modules/channels/widget/ router.py, router_public.py app/routers/widget.py (admin); public endpoints direct
modules/channels/mobile/ router.py app/routers/mobile.py
modules/sales/ router_bot_sales.py, router_yoomoney.py app/routers/bot_sales.py, yoomoney_webhook.py
modules/core/ router_auth.py, router_roles.py, router_workspace.py app/routers/auth.py, roles.py, workspace.py
modules/admin/ router_backup.py, router_legal.py, router_github_webhook.py app/routers/backup.py, legal.py, github_webhook.py
modules/monitoring/ router_audit.py, router_usage.py, router_monitor.py app/routers/audit.py, usage.py, monitor.py
modules/chat/ router.py app/routers/chat.py
modules/llm/ router.py app/routers/llm.py
modules/google/ router.py (+ callback_router) app/routers/google.py
modules/knowledge/ router_google_drive.py Google Drive RAG sync (/admin/google-drive/*)

Phase 4 routers (extracted from orchestrator.py, not from app/routers/):

Domain Router file Endpoints Phase
modules/compat/ router.py Legacy telephony (/tts, /stt, /chat, /process_call, /reset_conversation) + OpenAI-compat (/v1/*) 4.3
modules/core/ router_health.py /, /health, /admin/deployment-mode 4.3
modules/llm/ router_finetune.py LLM finetune: dataset, training, LoRA adapters (/admin/finetune/*) 4.4
modules/speech/ router_finetune.py TTS finetune: samples, training, models (/admin/tts-finetune/*) 4.4
modules/speech/ router_voices.py Voice selection + test (/admin/voices, /admin/voice, /admin/voice/test) 4.5
modules/llm/ router_models.py HuggingFace model management (/admin/models/*) 4.5
modules/monitoring/ router_logs.py Log viewing + streaming (/admin/logs/*) 4.5

New routers import domain services directly (from modules.monitoring.service import audit_service) instead of through the facade. GPU-only routers (router_voices.py, router_models.py, router_finetune.py) are conditionally registered when DEPLOYMENT_MODE != "cloud".

Key Components

orchestrator.py (~320 lines): FastAPI entry point — pure wiring, zero domain logic. No inline endpoints (Phase 4.1–4.5), no raw asyncio.create_task() (Phase 4.6), no helper functions (Phase 4.7a), no global service variables (Phase 4.7b). Contains only: imports, CORS/middleware, declarative router registration (~28 routers), startup_event() (calls domain init functions + registers tasks), shutdown_event() (delegates to graceful_shutdown()), static file serving, Vite dev proxy. All service initialization in domain startup.py modules: modules/speech/startup.py (TTS/STT), modules/llm/startup.py (LLM + fallback chain + InternetMonitor callback), modules/knowledge/startup.py (Wiki RAG + embeddings), modules/core/startup.py (seed, monitor, shutdown), modules/telephony/startup.py (GSM), modules/channels/{telegram,whatsapp}/startup.py (bot auto-start).

ServiceContainer (app/dependencies.py): Singleton holding references to all initialized services — the single source of truth for service state (no global variables). Includes event_bus: EventBus singleton for inter-module events. Routers get services via FastAPI Depends. Populated during app startup by domain init_*() functions. Runtime mutations (LLM backend switch) write directly to container.

Two service layers: Core AI services at project root (cloud_llm_service.py, vllm_llm_service.py, voice_clone_service.py, stt_service.py, etc.). Domain services in app/services/ (amocrm_service.py, wiki_rag_service.py, backup_service.py, sales_funnel.py, etc.).

Database layer (db/): Async SQLAlchemy + aiosqlite. db/database.py creates engine. db/integration.py is a ~100-line facade that imports singletons and class aliases from domain services (from modules.chat.service import chat_service as async_chat_manager). Singletons are created in modules/*/service.py; the facade only re-exports them under old names. Repositories in db/repositories/ inherit from BaseRepository with generic CRUD and _apply_workspace_filter() for multi-tenant queries.

Unit of Work: Repositories only flush() — never commit(). Callers own transaction boundaries: service methods call session.commit(), get_async_session() auto-commits on success / rollbacks on exception.

SQLITE_BUSY retry: db/retry.py @retry_on_busy() — exponential backoff (3 retries, 0.1s base). Applied to write methods in domain service classes (16 methods across 5 services).

Telegram bots: Subprocesses managed by multi_bot_manager.py. Config pre-fetched from DB, written to /tmp/bot_config_{id}.json. Two frameworks: python-telegram-bot (legacy) + aiogram (new). LLMRouter in telegram_bot/services/llm_router.py routes through orchestrator chat API. File uploads: telegram_bot/services/file_extractor.py extracts text from documents (text files + PDF via pdfplumber), injected into chat as plain text.

WhatsApp bots: Same subprocess pattern via whatsapp_manager.py. Module: whatsapp_bot/ (runs as python -m whatsapp_bot).

Platform agent fallback (prompts/platform-agent.md): When a chat session has no system_prompt set, modules/chat/facade.py loads this file as the system prompt (lazy, cached per-process) before falling back to llm.get_system_prompt(). Persona helps end-users configure their own assistants; no admin/ops content. Override path via PLATFORM_AGENT_PROMPT_FILE.

Role-specific prompt templates (prompts/lawyer-ru.md, lawyer-kz.md, accountant-kz.md, README-roles.md): hand-written system prompts for widget/bot instances tied to the static legal/accountancy collections. Not loaded automatically — admin pastes content into the instance's system_prompt field. Each enforces: cite article + code, warn about редакция drift, refuse aiding crime, no legal/tax opinions. README maps slug → role → which collections to attach.

Bridge HOME isolation (services/bridge/src/providers/claude/provider.py): The Claude CLI subprocess spawned by the bridge inherits $HOME from the service, so it picks up /root/CLAUDE.md and user-memory files — which historically leaked admin context into end-user chats. Set BRIDGE_ISOLATE_HOME=1 in .env to spawn the CLI with HOME=/var/lib/ai-secretary-bridge and cwd=<home>/sandbox (outside /root), with credentials symlinked from the real ~/.claude/. Default off to avoid breaking dev setups without the isolated dir.

Cloud LLM: cloud_llm_service.py factory pattern. OpenAI-compatible providers auto-handled via OpenAICompatibleProvider. Custom SDKs get their own provider class inheriting BaseLLMProvider. Provider types in PROVIDER_TYPES dict in db/models.py. Supports model fallback via fallback_models list. supports_tools flag + generate_with_tools() on OpenAICompatibleProvider and VLLMLLMService for tool-calling (agentic RAG).

Wiki RAG: app/services/wiki_rag_service.py — tiered search: (1) semantic embeddings (Gemini/OpenAI/local), (2) BM25 with Russian/English stemming, (3) Vector Search microservice (if VECTOR_SEARCH_URL configured). Multi-collection support. Per-instance RAG config on bots/widgets. Agentic RAG (modules/chat/router.py): server-side loop where LLM calls knowledge_search tool to query the knowledge base on demand (max 5 iterations). Providers without supports_tools (Gemini SDK) fall back to one-shot RAG injection. Frontend shows inline search indicator via tool_start/tool_end SSE events. Vector Search (services/vector-search/): standalone FastAPI microservice using ChromaDB + paraphrase-multilingual-mpnet-base-v2 (768 dims). Client: app/services/vector_search_client.py (async httpx). Runs as Docker profile vector-search on port 8003. Async search methods (search_async, retrieve_async, retrieve_multi_async) run all engines in parallel via asyncio.gather and merge/deduplicate results. Background task vector-search-sync upserts all sections on startup. DatasetSynced event triggers incremental sync. Admin endpoints: GET /admin/wiki-rag/vector-search/status, POST /admin/wiki-rag/vector-search/sync.

RSS knowledge layer (modules/knowledge/rss_service.py, models RSSFeed/RSSFeedItem in modules/knowledge/models.py): each RSSFeed row maps a URL to a KnowledgeCollection. Periodic task rss-sync (1h interval) calls feedparser with cached ETag/Last-Modified, dedupes new entries by GUID, optionally fetches full article HTML and converts it to markdown via lxml (chrome stripped: nav/footer/scripts/share/cookie banners), writes one md file per item to the collection's wiki-pages/ dir, creates KnowledgeDocument rows, then calls wiki_rag.reload_collection() + sync_collection_to_vector_search() directly (skips DatasetSynced event because its handler deletes-and-recreates docs which would orphan rss_feed_items.document_id FKs). Per-feed flags: fetch_full_text, verify_ssl (needed for adilet.zan.kz with non-standard CA). Caps: MAX_ITEMS_PER_FEED=50, MAX_FULL_TEXT_BYTES=1.5MB. Admin CRUD: GET/POST/PATCH/DELETE /admin/rss/feeds, POST /admin/rss/feeds/{id}/sync, POST /admin/rss/sync-all. Admin UI lives in fine-tune Collections section. Seed script scripts/seed_rss_feeds.py provisions 3 news collections (ru-bukh-news, ru-pravo-news, kz-news) with 13 verified RU/KZ accountancy & legal feeds.

Per-user Claude token tracking (shared $100 Anthropic plan): usage_log table has nullable user_id (migration 0024_usage_log_user_id) keyed to chat session owner. OpenAICompatibleProvider.last_usage populated from response usage in non-stream path, from final-chunk usage or tiktoken estimate in stream path; CloudLLMService.last_usage proxies to provider. modules/chat/facade.py:_log_llm_usage writes one row per Claude/claude_bridge response (best-effort, never raises) with service_type=llm, units_consumed=input+output, details={input_tokens, output_tokens, model, estimated}. Period bounds in modules/monitoring/period.py — anchor day-of-month (default 30, capped to last day for short months). Endpoints: GET /admin/usage/me (any auth, own total), GET /admin/usage/by-user (admin, all users sorted desc by tokens). Mobile shows a thin orange→red bar under the context indicator with the user's own period total (mobile/src/views/ChatView.vue); admin Users view has a "Токены" column (admin/src/views/UsersView.vue). Only Claude is tracked — Gemini/OpenAI/vLLM are skipped by _is_claude_provider. Streaming numbers are tiktoken-estimated (the bridge doesn't currently emit usage chunks); non-streaming responses use real Anthropic numbers.

Static legal & accountancy collections (scripts/scrape_digitax/): scraping pipeline for fixed corpora (federal codes, tax authority pages, professional bodies). Three steps: scrape.py --site <slug> (BFS crawler, BFS link-extractors per site), parse.py --site <slug> (HTML→Markdown via lxml + per-site CONTENT_SELECTORS), upload.py --site <slug> (writes DB rows + copies to wiki-pages/<slug>/). Site catalog in config.py:SITES — 7 Irish accountancy sites (digitax) + 3 RU bookkeeping (USN) + 10 RU federal codes already scraped (consultant.ru) + 38 additional RU consultant.ru sources generated from RU_FEDERAL_LAWS list (Constitution, 11 more codes — НК ч.1/2, БК, ГПК, АПК, КАС, ЗК, ЛК, ВК, ГсК; 24 ФЗ across corporate / administrative / social / finance / info domains; 11 ФКЗ; only ru-fz-273 actually scraped so far) + 7 KZ codes (adilet.zan.kz) + 2 KZ accountancy practical (kgd.gov.kz, mybuh.kz; configured but not scraped). consultant.ru codes: each cons_doc_LAW_<id> is a BFS root with article-level URLs branching out; sidebar pollution stripped via div.seo-links removal in parse.py:strip_boilerplate. adilet.zan.kz codes: each Kazakh code is a single monolithic page (~1.5–5 MB), needs verify_ssl: False (Kazakh root CA not in certifi); content selector div.container_gamma.text.text_upd. Critical: global POST /admin/wiki-rag/reload only re-indexes the legacy WIKI_DIR (root-level files), NOT collections. After bulk upload, loop POST /admin/wiki-rag/collections/{id}/reload per collection. Server-side runs of upload.py skip self-copy when source dir is wiki-pages/ (parsed/ absent on production).

Frontend Architecture

Stack: Vue 3 + Composition API + TypeScript, Vite, Pinia (persisted), Vue Router (hash history), TanStack Vue Query, vue-i18n (ru/en/kk), TailwindCSS + radix-vue, lucide-vue-next. Path alias @admin/src/.

Routing (admin/src/router.ts): createWebHashHistory. Routes use meta fields: public (bypass auth), localOnly (hidden in cloud mode), module (RBAC module name), minLevel (view/edit/manage).

Stores (admin/src/stores/): Key store auth.ts holds JWT, user, deploymentMode, permissions. Exposes isAdmin, isCloudMode, hasModule(), canView(), canEdit(), canManage(). Toast API: toast.success/error/warning/info(title) — do NOT use toast.show() (different signature: show(type, title, message?)). Confirm API: confirm.confirm({ title, message, confirmText, type }) returns Promise<boolean> — no ask() method.

API layer (admin/src/api/): client.ts provides api.get/post/put/delete/upload + createSSE() (auto-injects JWT). Domain files build on it. All re-exported from api/index.ts.

Demo mode: VITE_DEMO_MODE=true monkey-patches window.fetch to intercept API calls with mock data from 23 domain files in admin/src/api/demo/.

Product variant (VITE_PRODUCT_VARIANT env var, defaults to full): Set to lite to ship a stripped admin panel. Lite variant whitelists /chat, /llm, /wiki, /finetune (collections CRUD), /widget, /telegram, /whatsapp, /mobile-app, /settings, /users, /about, /login, /invite/* — everything else is blocked by the router guard and hidden from nav. Scripts: npm run dev:lite and npm run build:lite. Central helper: admin/src/config/productVariant.ts (IS_LITE, isPathAllowed()).

Vite base path: Production /admin/ (served by FastAPI). Demo/standalone: / (via VITE_BASE_PATH or .env.production.local).

Mobile App (mobile/)

Stack: Vue 3 + TypeScript, Vite, Pinia, Vue Router, Capacitor (Android), TailwindCSS 4. Path alias @mobile/src/.

Purpose: Standalone Android chat app connecting to https://ai-sekretar24.ru (hardcoded). Role-based experience: admins get full chat controls, non-admins see only shared chats.

Screens: LoginView (auth only, no server URL), ChatListView (admin: session list + FAB + delete; non-admin: Claude-like welcome + shared chat cards), ChatView (streaming chat + TTS + role-based controls), SettingsView (account + logout).

Theme: Night-eyes (warm brown/amber/gold), hardcoded — no theme switching. Background #1a1308, text #d9c9a8, primary amber-600, cards stone-800.

Role-based access:

  • Admin (role=admin): full chat controls — LLM provider selector, RAG collection multi-select, system prompt editing, export (copy/md/json), branching, context files, all message actions (edit, regenerate, summarize, delete branch), session creation/deletion. Admin-only controls live in admin panel only (not mobile).
  • Non-admin: shared chats (is_shared_with_me filter), Claude-like welcome, basic message actions (TTS + copy), branching, context files, web search toggle. No LLM/RAG selectors, no export, no session deletion.

Mobile Instances: Admin creates MobileAppInstance (LLM backend, persona, system prompt, TTS, RAG) in admin panel (/mobile-app view). Users are assigned to instances via ResourceShare. On login, mobile app fetches GET /admin/mobile/my-config to get assigned instance config. Chat sessions use source="mobile" + mobile_instance_id for per-instance LLM/prompt routing.

API layer (mobile/src/api/): client.ts (base fetch + upload for multipart FormData), chat.ts (sessions/streaming/branches/uploadImage), admin.ts (admin-only APIs, used only by admin panel).

File upload in chat: Same backend as admin (POST /admin/chat/sessions/{id}/upload-image). ChatInput.vue has paperclip button between input and send. Files uploaded → image_ids passed to streamMessage → backend injects extracted text (OCR/PDF/DOCX/XLSX) into LLM context. Accepts: JPEG, PNG, WebP, GIF, PDF, XLSX, DOCX, TXT, CSV, MD, JSON, XML, HTML, YAML. Max 10MB.

Per-session named system prompts (ChatSessionPrompt model, table chat_session_prompts): each chat session can hold multiple named prompts; exactly one is active. Endpoints GET/POST /admin/chat/sessions/{id}/prompts, PATCH /admin/chat/sessions/{id}/prompts/{pid}, POST /admin/chat/sessions/{id}/prompts/{pid}/activate, DELETE …/{pid}. The active prompt's content is mirrored into ChatSession.system_prompt, so the existing streaming pipeline picks it up unchanged — switching prompt swaps the assistant's role while preserving conversation history. Creating the first prompt while the session already has a system_prompt preserves it as the initial content. Deleting the active prompt promotes the most recent remaining one.

Key differences from admin panel:

  • Hardcoded server URL (https://ai-sekretar24.ru), no user configuration
  • JWT stored in native Preferences (not localStorage)
  • No demo mode, no full RBAC UI — role-based chat experience
  • ~77KB gzipped (vs ~2MB admin)

Build: cd mobile && npm run build && npx cap sync android. APK via Android Studio: npx cap open android → Build → Build APK.

No lint/format/test — mobile app has only dev, build, preview scripts. Type checking happens during npm run build via vue-tsc -b.

Code Patterns

Adding a new API endpoint:

  1. Create/edit router in modules/{domain}/router.py (preferred) or app/routers/ (legacy)
  2. Use domain service singletons (from modules.chat.service import chat_service) for DB access
  3. If using app/routers/, add to __all__ in app/routers/__init__.py
  4. Register in orchestrator.py with app.include_router()

Adding a new cloud LLM provider type:

  1. Add entry to PROVIDER_TYPES dict in db/models.py
  2. OpenAI-compatible → works automatically via OpenAICompatibleProvider
  3. Custom SDK → create provider class inheriting BaseLLMProvider in cloud_llm_service.py, register in CloudLLMService.PROVIDER_CLASSES

RBAC auth guards (in auth_manager.py):

  • Depends(require_permission(module, level)) — checks module permission
  • user_has_level(user, module, level) — inline check within endpoint
  • workspace_context(user, module)(owner_id, workspace_id) — for repository calls; owner_id=None means "shared within workspace"
  • Data isolation: always pass both owner_id and workspace_id to repository/manager

Gate-check pattern for mutations: UPDATE/DELETE endpoints call workspace-filtered get first (e.g., get_by_id_ws(id, workspace_id=ws_id)); if None → 404. Prevents cross-workspace access. ChatRepository is the reference implementation.

Adding i18n translations: Edit admin/src/plugins/i18n.ts — add keys to all three message objects: ru, en, kk.

API URL patterns:

  • GET/POST /admin/{resource} — List/create
  • GET/PUT/DELETE /admin/{resource}/{id} — CRUD
  • POST /admin/{resource}/{id}/action — Actions (start, stop, test)
  • POST /webhooks/{service} — External webhooks
  • POST /v1/chat/completions, GET /v1/models — OpenAI-compatible

Codebase Conventions

  • Python 3.11+, line length 100, double quotes (ruff format)
  • Cyrillic is normal — RUF001/002/003 disabled; Russian in UI text, logging, persona prompts
  • FastAPI Depends — B008 disabled for Depends() in default args
  • Optional imports — Services like vLLM and OpenVoice use try/except at module level with *_AVAILABLE flags
  • SQLAlchemy 2.0 styleMapped[T] with mapped_column() (declarative 2.0)
  • Repository patternBaseRepository(Generic[T]) provides CRUD + _apply_workspace_filter(). Repos only flush(), never commit()
  • mypy strict only for db/, auth_manager.py, service_manager.py; other modules relaxed. mypy is soft in CI
  • Pre-commit hooks — ruff, mypy (core only), eslint, hadolint, standard checks (see .pre-commit-config.yaml)

Key Environment Variables

LLM_BACKEND=vllm                    # "vllm" or "cloud:{provider_id}" (legacy "gemini" auto-migrates)
VLLM_API_URL=http://localhost:11434 # Auto-normalized: trailing /v1 stripped
DEPLOYMENT_MODE=full                # "full", "cloud", or "local"
ORCHESTRATOR_PORT=8002
ADMIN_JWT_SECRET=...                # Auto-generated if empty
REDIS_URL=redis://localhost:6379/0  # Optional, graceful fallback
DEV_MODE=1                          # Backend proxies to Vite dev server (:5173)
VECTOR_SEARCH_URL=http://localhost:8003  # Optional, Vector Search microservice
VECTOR_SEARCH_TOKEN=                # Bearer token for Vector Search API
GOOGLE_CLIENT_ID=                   # Google OAuth 2.0 (Drive, Docs, Sheets, Gmail)
GOOGLE_CLIENT_SECRET=               # Google OAuth 2.0 client secret
GOOGLE_REDIRECT_URI=                # OAuth callback URL (default: {BASE_URL}/admin/oauth/google/callback)
PLATFORM_AGENT_PROMPT_FILE=         # Override path to platform-agent fallback prompt (default: /opt/ai-secretary/prompts/platform-agent.md)
BRIDGE_ISOLATE_HOME=                # "1" to spawn Claude CLI with isolated HOME so host's CLAUDE.md/memory files don't leak into user chats
BRIDGE_ISOLATED_HOME=               # Override isolated HOME path (default: /var/lib/ai-secretary-bridge)

Deployment

Server Deployment (Production)

Server: root@155.212.231.7, systemd service (not Docker Compose).

ssh root@155.212.231.7
cd /opt/ai-secretary
git pull origin main
cd admin && npm ci && npm run build
rsync -av --delete admin/dist/ /var/www/admin-ai-sekretar24/  # REQUIRED: nginx serves from /var/www/
sed -i "s/ai-admin-v[0-9a-z]*/ai-admin-v$(date +%s)/" /var/www/admin-ai-sekretar24/sw.js  # bust SW cache
systemctl restart ai-secretary               # restart orchestrator
curl -s http://localhost:8002/health         # health check

IMPORTANT: Nginx serves frontend from /var/www/admin-ai-sekretar24/, NOT from /opt/ai-secretary/admin/dist/. Always rsync after build.

Webhook auto-deploy: ai-secretary-webhook.service triggers on GitHub push.

Local-only files (not in git): .env, docker-compose.override.yml, modified Dockerfile, services/bridge/src/models/

Deployment Checklist

  1. Run lint locally — ruff check . && cd admin && npm run lint:check
  2. Check for pending DB migrations
  3. Kill stale processes — lsof -i :8002
  4. Clean build artifacts — rm -rf admin/dist admin/node_modules/.vite
  5. Build — npm run build (verify VITE_DEMO_MODE is NOT set)
  6. Restart — docker compose restart ai-secretary
  7. Verify — curl http://localhost:8002/health + test /admin/auth/login

Automated Deployment

./deploy.sh                # git pull, re-apply patches, build admin, restart orchestrator
./test_system.sh           # Quick health checks and API smoke tests

Demo Sites

Fully offline demo builds at demo.ai-sekretar24.ru:

  • Full demo (/full/): npm run build -- --mode demo (admin role, all features)
  • Cloud demo (/cloud/): npm run build -- --mode demo-web (web role, customer-facing)
  • Deploy: bash /root/deploy-demo.sh

Debugging Principles

Check in this order — infrastructure first, application logic last:

  1. Build artifacts — correct build deployed? Stale demo interceptors?
  2. Deploy pipeline — stale Vite cache, wrong .env, VITE_DEMO_MODE leaking?
  3. DB state — migrations applied? sqlite3 data/secretary.db ".tables" / .schema
  4. Process state — port conflicts (lsof -i :8002), zombie processes?
  5. Auth/JWTADMIN_JWT_SECRET auto-generated on restart (invalidates tokens). Sessions validated against user_sessions table via SessionCache
  6. Application logic — only after ruling out 1–5

Parallel Development (Two Claude Code Instances)

This project is developed from two machines:

  • local — dev workstation with GPU (RTX 3060), full stack
  • server — Beget VPS (root@155.212.231.7), systemd service, cloud LLM only

Each machine identifies itself via ~/.claude/projects/.../memory/MEMORY.md (## Machine Role section). Check your machine role before git operations.

Git Workflow

  1. Never push directly to main — always feature branch + PR
  2. Branch prefixes: local/* (dev machine), server/* (server), or feat//fix//docs/ with machine suffix
  3. Always git pull before starting work
  4. Do not amend or force-push commits made by the other instance

File Ownership

Local primary: Hardware services (voice_clone_service.py, stt_service.py, vllm_llm_service.py, piper_tts_service.py), GPU/hardware routers, fine-tuning, voice samples, start scripts.

Server primary: Cloud services (cloud_llm_service.py, xray_proxy_manager.py), Docker files, bot operations (runtime), production data.

Shared (coordinate via branches): orchestrator.py, app/routers/, db/, admin/, CLAUDE.md, migrations (create new files only).

Known Issues

  1. Vosk model required — Download to models/vosk/ for STT
  2. XTTS requires CUDA CC >= 7.0 — RTX 3060+; use OpenVoice for older GPUs
  3. GPU memory — vLLM ~6GB + XTTS ~5GB must fit in 12GB
  4. VLESS proxy vs localhostGeminiProvider sets global HTTP_PROXY; OpenAICompatibleProvider sets NO_PROXY=127.0.0.1,localhost for claude_bridge; bridge_manager.py strips proxy env vars
  5. Claude bridge timeouts — 7-30s warmup. read=300s timeout for bridge (vs 60s default). max_tokens=4096 (vs 512)
  6. services/bridge/src/models/ gitignored.gitignore pattern models/ catches it. Copy manually after clone
  7. Docker CPU: whisper excludedopenai-whisper fails to build (missing pkg_resources). Server Dockerfile patched to grep -v whisper
  8. Docker + Claude CLI — CPU image needs Node.js. Server Dockerfile patched to install Node.js 20 + @anthropic-ai/claude-code
  9. Circular import in domain __init__.py — Domain __init__.py files MUST stay empty (no service re-exports). Chain: db/models.py imports from modules.X.models import ... → Python executes modules/X/__init__.py → if it imports service.pyservice.py imports db.repositoriesdb.repositories imports db.models → circular. Workaround: import services directly (from modules.chat.service import ChatService). Future fix (Phase 3+): eliminate eager imports in db/models.py by making it a lazy facade or removing it entirely once consumers import models from domain modules.