This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
AI Secretary System — virtual secretary with voice cloning (XTTS v2, OpenVoice), pre-trained voices (Piper), local LLM (vLLM + Qwen/Llama/DeepSeek), cloud LLM fallback (Gemini, Kimi, OpenAI, Claude, DeepSeek, OpenRouter), and Claude Code CLI bridge. Features GSM telephony (SIM7600E-H), amoCRM integration, Vue 3 PWA admin panel, i18n (ru/en/kk), multi-instance Telegram bots with sales/payments, multi-instance WhatsApp bots (Cloud API), website chat widgets, and LoRA fine-tuning.
# Docker (recommended)
cp .env.docker.example .env && docker compose up -d # GPU mode
docker compose -f docker-compose.yml -f docker-compose.cpu.yml up -d # CPU mode
docker compose -f docker-compose.yml -f docker-compose.full.yml up -d # Full containerized (includes vLLM)
docker compose --profile vector-search up -d # + Vector Search microservice (:8003)
# Local
./start_gpu.sh # GPU: XTTS + Qwen2.5-7B + LoRA
./start_cpu.sh # CPU: Piper + Gemini API
curl http://localhost:8002/healthcd admin && npm install # First-time setup
cd admin && npm run build # Production build (vue-tsc type-check + vite build)
cd admin && npm run dev # Dev server (:5173), proxies /admin + /v1 + /health to :8002
DEV_MODE=1 ./start_gpu.sh # Backend proxies to Vite dev serverDefault login: admin / admin. Guest demo: demo / demo (read-only).
No frontend tests — npm test is not configured. Type checking happens during npm run build via vue-tsc -b.
Deploy gotcha: Vite deletes and recreates admin/dist/ (new inode), breaking Docker bind mounts. Always docker compose restart after npm run build.
cd mobile && npm install # First-time setup
cd mobile && npm run build # Production build (vue-tsc type-check + vite build)
cd mobile && npm run dev # Dev server
cd mobile && npx cap sync android # Sync web assets to Android project
cd mobile && npx cap open android # Open in Android Studio → Build APKpython scripts/manage_users.py list # List all users
python scripts/manage_users.py create <user> <pass> --role user # Roles: admin|user|web|guest
python scripts/manage_users.py set-password <user> <pass> # Reset password
python scripts/manage_users.py set-role <user> <role> # Change role
python scripts/manage_users.py disable <user> # Deactivatesudo bash scripts/setup_mobile_internet.sh start # Connect wwan0 via QMI
sudo bash scripts/setup_mobile_internet.sh stop # Disconnect
sudo bash scripts/setup_mobile_internet.sh status # Check connection
sudo bash scripts/mobile-internet-monitor.sh # Daemon: auto-reconnect + VPN route failover
# systemd: scripts/mobile-internet.service # Install as persistent serviceThree migration systems:
- Alembic (preferred) — for schema changes (
ALTER TABLE, new tables) scripts/migrate_*.py— for data migrations. New scripts must usescripts/_migration_template.py(transaction-safe)Base.metadata.create_all— auto-creates missing tables on startup (does not alter existing)
alembic upgrade head # Apply all pending migrations
alembic revision --autogenerate -m "desc" # Generate migration from model changes
cp scripts/_migration_template.py scripts/migrate_<name>.py # New data migration# Python
ruff check . # Lint (see pyproject.toml for rules)
ruff check . --fix # Auto-fix
ruff format . # Format
ruff format --check . # Check formatting
# Frontend
cd admin && npm run lint # Lint + auto-fix
cd admin && npm run lint:check # Lint only (CI-style)
cd admin && npm run format # Prettier format
cd admin && npm run format:check # Check formatting
# All at once
pre-commit run --all-filespytest tests/ # All tests
pytest tests/unit/test_retry_on_busy.py -v # Single file
pytest -k "test_chat" -v # By name pattern
pytest -m "not slow" -v # Exclude slow tests
pytest -m "not integration" -v # Exclude integration (needs external services)asyncio_mode = "auto" — async tests run without @pytest.mark.asyncio. Custom markers: slow, integration, gpu. Docker: docker exec ai-secretary python -m pytest tests/ -v -o asyncio_mode=auto.
GitHub Actions (.github/workflows/ci.yml) on push to main/develop and PRs:
lint-backend— ruff check + format check + mypy onorchestrator.pyonly (mypy soft,|| true)lint-frontend— npm ci + eslint + build (type check)security— Trivy vulnerability scanner
Always run lint locally before pushing. Protected branches require PR workflow — never push directly to main.
┌──────────────────────────────────────────────────────────────┐
│ Orchestrator (port 8002) │
│ orchestrator.py + modules/*/router*.py (~28 routers) │
│ ┌────────────────────────────────────────────────────────┐ │
│ │ Vue 3 Admin Panel (24 views, PWA) │ │
│ │ admin/dist/ │ │
│ └────────────────────────────────────────────────────────┘ │
└────────────┬──────────────┬──────────────┬───────────────────┘
│ │ │
┌───────┴──┐ ┌──────┴───┐ ┌─────┴─────┐
│ LLM │ │ TTS │ │ STT │
│ vLLM / │ │ XTTS v2 /│ │ Vosk / │
│ Cloud │ │ Piper │ │ Whisper │
└──────────┘ └──────────┘ └───────────┘
Request flow: User message → FAQ check (instant match) OR LLM → TTS → Audio response
Deployment modes (DEPLOYMENT_MODE env var): full (default, everything), cloud (no GPU/TTS/STT/GSM), local (same as full). Cloud mode skips hardware router registration, hides hardware admin tabs, filters out speech/gsm permissions.
Foundation layer for modular decomposition (issue #489). Phase 4 complete: all 28 routers migrated (Phase 3), all inline endpoints extracted (Phase 4.1–4.5), all background tasks via TaskRegistry (Phase 4.6), all startup helpers + service init extracted to domain startup.py modules, global service variables removed (Phase 4.7a/b). Phase 5.1–5.4 complete (EventBus infrastructure, first events, DatasetSynced, Widget→CRM events). Phase 6 complete (Protocol interfaces for AuthService, KnowledgeService, LLMService, ChatService in modules/*/protocols.py). Phase 7.1 complete (AuthService facade in modules/core/auth_service.py). Phase 7.2 complete (KnowledgeService facade in modules/knowledge/facade.py). Phase 7.3 complete (LLMService facade in modules/llm/facade.py). Phase 7.4 complete (ChatService facade in modules/chat/facade.py).
-
EventBus(modules/core/events.py): In-process async pub/sub. Handlers run concurrently viaasyncio.gather; exceptions are logged, never propagated to publisher.BaseEventdataclass with auto-timestamp. Singleton inServiceContainer.event_bus. Domain events:InternetStatusChanged,UserRoleChanged,SessionRevoked,DatasetSynced(inmodules/core/events.py),KnowledgeUpdated(inmodules/knowledge/events.py),WidgetSessionCreated,WidgetMessageSent,WidgetContactSubmitted(inmodules/channels/widget/events.py). Subscriptions registered viasetup_event_subscriptions()inmodules/core/startup.py, which delegates to domain-specific setup functions (setup_llm_event_subscriptions()inmodules/llm/startup.py,setup_knowledge_event_subscriptions()inmodules/knowledge/startup.py,setup_crm_event_subscriptions()inmodules/crm/startup.py).DatasetSynceddecouples CRM/ecommerce/kanban from knowledge. Widget events decouple widget router from amoCRM: widget publishes events, CRM domain handles lead/contact/note creation reactively. -
TaskRegistry(modules/core/tasks.py): Named background tasks — periodic (interval-based) or one-shot.start_all()/cancel_all(timeout)lifecycle.TaskInfodataclass tracks status, run count, last error. 7 tasks registered instartup_event():session-cleanup(1h),periodic-vacuum(7d),kanban-sync(15min),woocommerce-sync(daily 23:00 UTC),rss-sync(1h),wiki-embeddings(one-shot),wiki-collection-indexes(one-shot). Task functions inmodules/core/maintenance.py,modules/knowledge/tasks.py,modules/knowledge/rss_service.py,modules/kanban/tasks.py,modules/ecommerce/tasks.py. -
HealthRegistry(modules/core/health.py): Modular health checks with per-check timeout (asyncio.wait_for). Status aggregation: all ok → ok, any degraded → degraded, any error → error. -
InternetMonitor(modules/core/internet_monitor.py): Periodic connectivity checker (ping DNS/Cloudflare). Auto-switches LLM backend: online → cloud provider (claude_bridge priority), offline → local vLLM. PublishesInternetStatusChangedevents via EventBus (container.event_bus). Configurable thresholds, 30s default interval. Status endpoint:GET /admin/gsm/internet-status. Health check includesinternetsection.
Import from modules.core: EventBus, BaseEvent, TaskRegistry, TaskInfo, HealthRegistry, HealthStatus, UserRoleChanged, SessionRevoked, DatasetSynced.
32 service classes extracted from the former monolithic db/integration.py into 16 domain files (Phase 2, issue #492):
| Module | File | Service Classes |
|---|---|---|
modules/core/ |
service.py |
DatabaseService, UserService, UserSessionService, RoleService, WorkspaceService, ConfigService, UserIdentityService |
modules/core/ |
auth_service.py |
AuthService (Phase 7.1 facade — wraps auth_manager.py functions, implements Protocol from protocols.py) |
modules/knowledge/ |
facade.py |
KnowledgeServiceImpl (Phase 7.2 facade — wraps wiki_rag_service + knowledge services, implements Protocol from protocols.py) |
modules/llm/ |
facade.py |
LLMServiceImpl (Phase 7.3 facade — wraps CloudLLMService/VLLMLLMService + CloudProviderService, implements Protocol from protocols.py) |
modules/chat/ |
facade.py |
ChatServiceImpl (Phase 7.4 facade — wraps ChatService CRUD + LLM generation + ChatShareService, implements Protocol from protocols.py) |
modules/chat/ |
service.py |
ChatService, ChatShareService |
modules/knowledge/ |
service.py |
FAQService, KnowledgeDocService, KnowledgeCollectionService, GitHubRepoProjectService |
modules/channels/telegram/ |
service.py |
BotInstanceService, TelegramSessionService |
modules/channels/whatsapp/ |
service.py |
WhatsAppInstanceService |
modules/channels/widget/ |
service.py |
WidgetInstanceService |
modules/channels/mobile/ |
service.py |
MobileAppInstanceService |
modules/kanban/ |
service.py |
KanbanService, KanbanProjectService |
modules/claude_code/ |
service.py |
ClaudeCodeService, ClaudeCodeProjectService |
modules/llm/ |
service.py |
CloudProviderService |
modules/monitoring/ |
service.py |
AuditService, PaymentService |
modules/admin/ |
service.py |
ResourceShareService |
modules/speech/ |
service.py, streaming.py |
PresetService, StreamingTTSManager |
modules/crm/ |
service.py |
AmoCRMService |
modules/ecommerce/ |
service.py |
WooCommerceService |
modules/telephony/ |
service.py |
GSMService |
modules/google/ |
service.py, models.py |
GoogleOAuthService |
Import pattern: from modules.chat.service import chat_service (direct, preferred) or from db.integration import async_chat_manager (backward-compatible alias). Domain __init__.py files do NOT re-export services (see Known Issues #9).
Phase 3 migration complete: all 28 routers moved from app/routers/ to domain modules. Original files are 1-3 line facade re-exports.
| Domain | Router file | Facade |
|---|---|---|
modules/ecommerce/ |
router.py |
app/routers/woocommerce.py |
modules/crm/ |
router.py |
app/routers/amocrm.py (exports router + webhook_router) |
modules/telephony/ |
router.py |
app/routers/gsm.py |
modules/speech/ |
router_tts.py, router_stt.py, router_services.py |
app/routers/tts.py, stt.py, services.py |
modules/knowledge/ |
router_faq.py, router_wiki_rag.py, router_github_repos.py |
app/routers/faq.py, wiki_rag.py, github_repos.py |
modules/kanban/ |
router.py |
app/routers/kanban.py |
modules/claude_code/ |
router.py |
app/routers/claude_code.py |
modules/channels/telegram/ |
router.py |
app/routers/telegram.py |
modules/channels/whatsapp/ |
router.py |
app/routers/whatsapp.py |
modules/channels/widget/ |
router.py, router_public.py |
app/routers/widget.py (admin); public endpoints direct |
modules/channels/mobile/ |
router.py |
app/routers/mobile.py |
modules/sales/ |
router_bot_sales.py, router_yoomoney.py |
app/routers/bot_sales.py, yoomoney_webhook.py |
modules/core/ |
router_auth.py, router_roles.py, router_workspace.py |
app/routers/auth.py, roles.py, workspace.py |
modules/admin/ |
router_backup.py, router_legal.py, router_github_webhook.py |
app/routers/backup.py, legal.py, github_webhook.py |
modules/monitoring/ |
router_audit.py, router_usage.py, router_monitor.py |
app/routers/audit.py, usage.py, monitor.py |
modules/chat/ |
router.py |
app/routers/chat.py |
modules/llm/ |
router.py |
app/routers/llm.py |
modules/google/ |
router.py (+ callback_router) |
app/routers/google.py |
modules/knowledge/ |
router_google_drive.py |
Google Drive RAG sync (/admin/google-drive/*) |
Phase 4 routers (extracted from orchestrator.py, not from app/routers/):
| Domain | Router file | Endpoints | Phase |
|---|---|---|---|
modules/compat/ |
router.py |
Legacy telephony (/tts, /stt, /chat, /process_call, /reset_conversation) + OpenAI-compat (/v1/*) |
4.3 |
modules/core/ |
router_health.py |
/, /health, /admin/deployment-mode |
4.3 |
modules/llm/ |
router_finetune.py |
LLM finetune: dataset, training, LoRA adapters (/admin/finetune/*) |
4.4 |
modules/speech/ |
router_finetune.py |
TTS finetune: samples, training, models (/admin/tts-finetune/*) |
4.4 |
modules/speech/ |
router_voices.py |
Voice selection + test (/admin/voices, /admin/voice, /admin/voice/test) |
4.5 |
modules/llm/ |
router_models.py |
HuggingFace model management (/admin/models/*) |
4.5 |
modules/monitoring/ |
router_logs.py |
Log viewing + streaming (/admin/logs/*) |
4.5 |
New routers import domain services directly (from modules.monitoring.service import audit_service) instead of through the facade. GPU-only routers (router_voices.py, router_models.py, router_finetune.py) are conditionally registered when DEPLOYMENT_MODE != "cloud".
orchestrator.py (~320 lines): FastAPI entry point — pure wiring, zero domain logic. No inline endpoints (Phase 4.1–4.5), no raw asyncio.create_task() (Phase 4.6), no helper functions (Phase 4.7a), no global service variables (Phase 4.7b). Contains only: imports, CORS/middleware, declarative router registration (~28 routers), startup_event() (calls domain init functions + registers tasks), shutdown_event() (delegates to graceful_shutdown()), static file serving, Vite dev proxy. All service initialization in domain startup.py modules: modules/speech/startup.py (TTS/STT), modules/llm/startup.py (LLM + fallback chain + InternetMonitor callback), modules/knowledge/startup.py (Wiki RAG + embeddings), modules/core/startup.py (seed, monitor, shutdown), modules/telephony/startup.py (GSM), modules/channels/{telegram,whatsapp}/startup.py (bot auto-start).
ServiceContainer (app/dependencies.py): Singleton holding references to all initialized services — the single source of truth for service state (no global variables). Includes event_bus: EventBus singleton for inter-module events. Routers get services via FastAPI Depends. Populated during app startup by domain init_*() functions. Runtime mutations (LLM backend switch) write directly to container.
Two service layers: Core AI services at project root (cloud_llm_service.py, vllm_llm_service.py, voice_clone_service.py, stt_service.py, etc.). Domain services in app/services/ (amocrm_service.py, wiki_rag_service.py, backup_service.py, sales_funnel.py, etc.).
Database layer (db/): Async SQLAlchemy + aiosqlite. db/database.py creates engine. db/integration.py is a ~100-line facade that imports singletons and class aliases from domain services (from modules.chat.service import chat_service as async_chat_manager). Singletons are created in modules/*/service.py; the facade only re-exports them under old names. Repositories in db/repositories/ inherit from BaseRepository with generic CRUD and _apply_workspace_filter() for multi-tenant queries.
Unit of Work: Repositories only flush() — never commit(). Callers own transaction boundaries: service methods call session.commit(), get_async_session() auto-commits on success / rollbacks on exception.
SQLITE_BUSY retry: db/retry.py @retry_on_busy() — exponential backoff (3 retries, 0.1s base). Applied to write methods in domain service classes (16 methods across 5 services).
Telegram bots: Subprocesses managed by multi_bot_manager.py. Config pre-fetched from DB, written to /tmp/bot_config_{id}.json. Two frameworks: python-telegram-bot (legacy) + aiogram (new). LLMRouter in telegram_bot/services/llm_router.py routes through orchestrator chat API. File uploads: telegram_bot/services/file_extractor.py extracts text from documents (text files + PDF via pdfplumber), injected into chat as plain text.
WhatsApp bots: Same subprocess pattern via whatsapp_manager.py. Module: whatsapp_bot/ (runs as python -m whatsapp_bot).
Platform agent fallback (prompts/platform-agent.md): When a chat session has no system_prompt set, modules/chat/facade.py loads this file as the system prompt (lazy, cached per-process) before falling back to llm.get_system_prompt(). Persona helps end-users configure their own assistants; no admin/ops content. Override path via PLATFORM_AGENT_PROMPT_FILE.
Role-specific prompt templates (prompts/lawyer-ru.md, lawyer-kz.md, accountant-kz.md, README-roles.md): hand-written system prompts for widget/bot instances tied to the static legal/accountancy collections. Not loaded automatically — admin pastes content into the instance's system_prompt field. Each enforces: cite article + code, warn about редакция drift, refuse aiding crime, no legal/tax opinions. README maps slug → role → which collections to attach.
Bridge HOME isolation (services/bridge/src/providers/claude/provider.py): The Claude CLI subprocess spawned by the bridge inherits $HOME from the service, so it picks up /root/CLAUDE.md and user-memory files — which historically leaked admin context into end-user chats. Set BRIDGE_ISOLATE_HOME=1 in .env to spawn the CLI with HOME=/var/lib/ai-secretary-bridge and cwd=<home>/sandbox (outside /root), with credentials symlinked from the real ~/.claude/. Default off to avoid breaking dev setups without the isolated dir.
Cloud LLM: cloud_llm_service.py factory pattern. OpenAI-compatible providers auto-handled via OpenAICompatibleProvider. Custom SDKs get their own provider class inheriting BaseLLMProvider. Provider types in PROVIDER_TYPES dict in db/models.py. Supports model fallback via fallback_models list. supports_tools flag + generate_with_tools() on OpenAICompatibleProvider and VLLMLLMService for tool-calling (agentic RAG).
Wiki RAG: app/services/wiki_rag_service.py — tiered search: (1) semantic embeddings (Gemini/OpenAI/local), (2) BM25 with Russian/English stemming, (3) Vector Search microservice (if VECTOR_SEARCH_URL configured). Multi-collection support. Per-instance RAG config on bots/widgets. Agentic RAG (modules/chat/router.py): server-side loop where LLM calls knowledge_search tool to query the knowledge base on demand (max 5 iterations). Providers without supports_tools (Gemini SDK) fall back to one-shot RAG injection. Frontend shows inline search indicator via tool_start/tool_end SSE events. Vector Search (services/vector-search/): standalone FastAPI microservice using ChromaDB + paraphrase-multilingual-mpnet-base-v2 (768 dims). Client: app/services/vector_search_client.py (async httpx). Runs as Docker profile vector-search on port 8003. Async search methods (search_async, retrieve_async, retrieve_multi_async) run all engines in parallel via asyncio.gather and merge/deduplicate results. Background task vector-search-sync upserts all sections on startup. DatasetSynced event triggers incremental sync. Admin endpoints: GET /admin/wiki-rag/vector-search/status, POST /admin/wiki-rag/vector-search/sync.
RSS knowledge layer (modules/knowledge/rss_service.py, models RSSFeed/RSSFeedItem in modules/knowledge/models.py): each RSSFeed row maps a URL to a KnowledgeCollection. Periodic task rss-sync (1h interval) calls feedparser with cached ETag/Last-Modified, dedupes new entries by GUID, optionally fetches full article HTML and converts it to markdown via lxml (chrome stripped: nav/footer/scripts/share/cookie banners), writes one md file per item to the collection's wiki-pages/ dir, creates KnowledgeDocument rows, then calls wiki_rag.reload_collection() + sync_collection_to_vector_search() directly (skips DatasetSynced event because its handler deletes-and-recreates docs which would orphan rss_feed_items.document_id FKs). Per-feed flags: fetch_full_text, verify_ssl (needed for adilet.zan.kz with non-standard CA). Caps: MAX_ITEMS_PER_FEED=50, MAX_FULL_TEXT_BYTES=1.5MB. Admin CRUD: GET/POST/PATCH/DELETE /admin/rss/feeds, POST /admin/rss/feeds/{id}/sync, POST /admin/rss/sync-all. Admin UI lives in fine-tune Collections section. Seed script scripts/seed_rss_feeds.py provisions 3 news collections (ru-bukh-news, ru-pravo-news, kz-news) with 13 verified RU/KZ accountancy & legal feeds.
Per-user Claude token tracking (shared $100 Anthropic plan): usage_log table has nullable user_id (migration 0024_usage_log_user_id) keyed to chat session owner. OpenAICompatibleProvider.last_usage populated from response usage in non-stream path, from final-chunk usage or tiktoken estimate in stream path; CloudLLMService.last_usage proxies to provider. modules/chat/facade.py:_log_llm_usage writes one row per Claude/claude_bridge response (best-effort, never raises) with service_type=llm, units_consumed=input+output, details={input_tokens, output_tokens, model, estimated}. Period bounds in modules/monitoring/period.py — anchor day-of-month (default 30, capped to last day for short months). Endpoints: GET /admin/usage/me (any auth, own total), GET /admin/usage/by-user (admin, all users sorted desc by tokens). Mobile shows a thin orange→red bar under the context indicator with the user's own period total (mobile/src/views/ChatView.vue); admin Users view has a "Токены" column (admin/src/views/UsersView.vue). Only Claude is tracked — Gemini/OpenAI/vLLM are skipped by _is_claude_provider. Streaming numbers are tiktoken-estimated (the bridge doesn't currently emit usage chunks); non-streaming responses use real Anthropic numbers.
Static legal & accountancy collections (scripts/scrape_digitax/): scraping pipeline for fixed corpora (federal codes, tax authority pages, professional bodies). Three steps: scrape.py --site <slug> (BFS crawler, BFS link-extractors per site), parse.py --site <slug> (HTML→Markdown via lxml + per-site CONTENT_SELECTORS), upload.py --site <slug> (writes DB rows + copies to wiki-pages/<slug>/). Site catalog in config.py:SITES — 7 Irish accountancy sites (digitax) + 3 RU bookkeeping (USN) + 10 RU federal codes already scraped (consultant.ru) + 38 additional RU consultant.ru sources generated from RU_FEDERAL_LAWS list (Constitution, 11 more codes — НК ч.1/2, БК, ГПК, АПК, КАС, ЗК, ЛК, ВК, ГсК; 24 ФЗ across corporate / administrative / social / finance / info domains; 11 ФКЗ; only ru-fz-273 actually scraped so far) + 7 KZ codes (adilet.zan.kz) + 2 KZ accountancy practical (kgd.gov.kz, mybuh.kz; configured but not scraped). consultant.ru codes: each cons_doc_LAW_<id> is a BFS root with article-level URLs branching out; sidebar pollution stripped via div.seo-links removal in parse.py:strip_boilerplate. adilet.zan.kz codes: each Kazakh code is a single monolithic page (~1.5–5 MB), needs verify_ssl: False (Kazakh root CA not in certifi); content selector div.container_gamma.text.text_upd. Critical: global POST /admin/wiki-rag/reload only re-indexes the legacy WIKI_DIR (root-level files), NOT collections. After bulk upload, loop POST /admin/wiki-rag/collections/{id}/reload per collection. Server-side runs of upload.py skip self-copy when source dir is wiki-pages/ (parsed/ absent on production).
Stack: Vue 3 + Composition API + TypeScript, Vite, Pinia (persisted), Vue Router (hash history), TanStack Vue Query, vue-i18n (ru/en/kk), TailwindCSS + radix-vue, lucide-vue-next. Path alias @ → admin/src/.
Routing (admin/src/router.ts): createWebHashHistory. Routes use meta fields: public (bypass auth), localOnly (hidden in cloud mode), module (RBAC module name), minLevel (view/edit/manage).
Stores (admin/src/stores/): Key store auth.ts holds JWT, user, deploymentMode, permissions. Exposes isAdmin, isCloudMode, hasModule(), canView(), canEdit(), canManage(). Toast API: toast.success/error/warning/info(title) — do NOT use toast.show() (different signature: show(type, title, message?)). Confirm API: confirm.confirm({ title, message, confirmText, type }) returns Promise<boolean> — no ask() method.
API layer (admin/src/api/): client.ts provides api.get/post/put/delete/upload + createSSE() (auto-injects JWT). Domain files build on it. All re-exported from api/index.ts.
Demo mode: VITE_DEMO_MODE=true monkey-patches window.fetch to intercept API calls with mock data from 23 domain files in admin/src/api/demo/.
Product variant (VITE_PRODUCT_VARIANT env var, defaults to full): Set to lite to ship a stripped admin panel. Lite variant whitelists /chat, /llm, /wiki, /finetune (collections CRUD), /widget, /telegram, /whatsapp, /mobile-app, /settings, /users, /about, /login, /invite/* — everything else is blocked by the router guard and hidden from nav. Scripts: npm run dev:lite and npm run build:lite. Central helper: admin/src/config/productVariant.ts (IS_LITE, isPathAllowed()).
Vite base path: Production /admin/ (served by FastAPI). Demo/standalone: / (via VITE_BASE_PATH or .env.production.local).
Stack: Vue 3 + TypeScript, Vite, Pinia, Vue Router, Capacitor (Android), TailwindCSS 4. Path alias @ → mobile/src/.
Purpose: Standalone Android chat app connecting to https://ai-sekretar24.ru (hardcoded). Role-based experience: admins get full chat controls, non-admins see only shared chats.
Screens: LoginView (auth only, no server URL), ChatListView (admin: session list + FAB + delete; non-admin: Claude-like welcome + shared chat cards), ChatView (streaming chat + TTS + role-based controls), SettingsView (account + logout).
Theme: Night-eyes (warm brown/amber/gold), hardcoded — no theme switching. Background #1a1308, text #d9c9a8, primary amber-600, cards stone-800.
Role-based access:
- Admin (
role=admin): full chat controls — LLM provider selector, RAG collection multi-select, system prompt editing, export (copy/md/json), branching, context files, all message actions (edit, regenerate, summarize, delete branch), session creation/deletion. Admin-only controls live in admin panel only (not mobile). - Non-admin: shared chats (
is_shared_with_mefilter), Claude-like welcome, basic message actions (TTS + copy), branching, context files, web search toggle. No LLM/RAG selectors, no export, no session deletion.
Mobile Instances: Admin creates MobileAppInstance (LLM backend, persona, system prompt, TTS, RAG) in admin panel (/mobile-app view). Users are assigned to instances via ResourceShare. On login, mobile app fetches GET /admin/mobile/my-config to get assigned instance config. Chat sessions use source="mobile" + mobile_instance_id for per-instance LLM/prompt routing.
API layer (mobile/src/api/): client.ts (base fetch + upload for multipart FormData), chat.ts (sessions/streaming/branches/uploadImage), admin.ts (admin-only APIs, used only by admin panel).
File upload in chat: Same backend as admin (POST /admin/chat/sessions/{id}/upload-image). ChatInput.vue has paperclip button between input and send. Files uploaded → image_ids passed to streamMessage → backend injects extracted text (OCR/PDF/DOCX/XLSX) into LLM context. Accepts: JPEG, PNG, WebP, GIF, PDF, XLSX, DOCX, TXT, CSV, MD, JSON, XML, HTML, YAML. Max 10MB.
Per-session named system prompts (ChatSessionPrompt model, table chat_session_prompts): each chat session can hold multiple named prompts; exactly one is active. Endpoints GET/POST /admin/chat/sessions/{id}/prompts, PATCH /admin/chat/sessions/{id}/prompts/{pid}, POST /admin/chat/sessions/{id}/prompts/{pid}/activate, DELETE …/{pid}. The active prompt's content is mirrored into ChatSession.system_prompt, so the existing streaming pipeline picks it up unchanged — switching prompt swaps the assistant's role while preserving conversation history. Creating the first prompt while the session already has a system_prompt preserves it as the initial content. Deleting the active prompt promotes the most recent remaining one.
Key differences from admin panel:
- Hardcoded server URL (
https://ai-sekretar24.ru), no user configuration - JWT stored in native Preferences (not localStorage)
- No demo mode, no full RBAC UI — role-based chat experience
- ~77KB gzipped (vs ~2MB admin)
Build: cd mobile && npm run build && npx cap sync android. APK via Android Studio: npx cap open android → Build → Build APK.
No lint/format/test — mobile app has only dev, build, preview scripts. Type checking happens during npm run build via vue-tsc -b.
Adding a new API endpoint:
- Create/edit router in
modules/{domain}/router.py(preferred) orapp/routers/(legacy) - Use domain service singletons (
from modules.chat.service import chat_service) for DB access - If using
app/routers/, add to__all__inapp/routers/__init__.py - Register in
orchestrator.pywithapp.include_router()
Adding a new cloud LLM provider type:
- Add entry to
PROVIDER_TYPESdict indb/models.py - OpenAI-compatible → works automatically via
OpenAICompatibleProvider - Custom SDK → create provider class inheriting
BaseLLMProviderincloud_llm_service.py, register inCloudLLMService.PROVIDER_CLASSES
RBAC auth guards (in auth_manager.py):
Depends(require_permission(module, level))— checks module permissionuser_has_level(user, module, level)— inline check within endpointworkspace_context(user, module)→(owner_id, workspace_id)— for repository calls;owner_id=Nonemeans "shared within workspace"- Data isolation: always pass both
owner_idandworkspace_idto repository/manager
Gate-check pattern for mutations: UPDATE/DELETE endpoints call workspace-filtered get first (e.g., get_by_id_ws(id, workspace_id=ws_id)); if None → 404. Prevents cross-workspace access. ChatRepository is the reference implementation.
Adding i18n translations: Edit admin/src/plugins/i18n.ts — add keys to all three message objects: ru, en, kk.
API URL patterns:
GET/POST /admin/{resource}— List/createGET/PUT/DELETE /admin/{resource}/{id}— CRUDPOST /admin/{resource}/{id}/action— Actions (start, stop, test)POST /webhooks/{service}— External webhooksPOST /v1/chat/completions,GET /v1/models— OpenAI-compatible
- Python 3.11+, line length 100, double quotes (ruff format)
- Cyrillic is normal — RUF001/002/003 disabled; Russian in UI text, logging, persona prompts
- FastAPI Depends — B008 disabled for
Depends()in default args - Optional imports — Services like vLLM and OpenVoice use try/except at module level with
*_AVAILABLEflags - SQLAlchemy 2.0 style —
Mapped[T]withmapped_column()(declarative 2.0) - Repository pattern —
BaseRepository(Generic[T])provides CRUD +_apply_workspace_filter(). Repos onlyflush(), nevercommit() - mypy strict only for
db/,auth_manager.py,service_manager.py; other modules relaxed. mypy is soft in CI - Pre-commit hooks — ruff, mypy (core only), eslint, hadolint, standard checks (see
.pre-commit-config.yaml)
LLM_BACKEND=vllm # "vllm" or "cloud:{provider_id}" (legacy "gemini" auto-migrates)
VLLM_API_URL=http://localhost:11434 # Auto-normalized: trailing /v1 stripped
DEPLOYMENT_MODE=full # "full", "cloud", or "local"
ORCHESTRATOR_PORT=8002
ADMIN_JWT_SECRET=... # Auto-generated if empty
REDIS_URL=redis://localhost:6379/0 # Optional, graceful fallback
DEV_MODE=1 # Backend proxies to Vite dev server (:5173)
VECTOR_SEARCH_URL=http://localhost:8003 # Optional, Vector Search microservice
VECTOR_SEARCH_TOKEN= # Bearer token for Vector Search API
GOOGLE_CLIENT_ID= # Google OAuth 2.0 (Drive, Docs, Sheets, Gmail)
GOOGLE_CLIENT_SECRET= # Google OAuth 2.0 client secret
GOOGLE_REDIRECT_URI= # OAuth callback URL (default: {BASE_URL}/admin/oauth/google/callback)
PLATFORM_AGENT_PROMPT_FILE= # Override path to platform-agent fallback prompt (default: /opt/ai-secretary/prompts/platform-agent.md)
BRIDGE_ISOLATE_HOME= # "1" to spawn Claude CLI with isolated HOME so host's CLAUDE.md/memory files don't leak into user chats
BRIDGE_ISOLATED_HOME= # Override isolated HOME path (default: /var/lib/ai-secretary-bridge)Server: root@155.212.231.7, systemd service (not Docker Compose).
ssh root@155.212.231.7
cd /opt/ai-secretary
git pull origin main
cd admin && npm ci && npm run build
rsync -av --delete admin/dist/ /var/www/admin-ai-sekretar24/ # REQUIRED: nginx serves from /var/www/
sed -i "s/ai-admin-v[0-9a-z]*/ai-admin-v$(date +%s)/" /var/www/admin-ai-sekretar24/sw.js # bust SW cache
systemctl restart ai-secretary # restart orchestrator
curl -s http://localhost:8002/health # health checkIMPORTANT: Nginx serves frontend from /var/www/admin-ai-sekretar24/, NOT from /opt/ai-secretary/admin/dist/. Always rsync after build.
Webhook auto-deploy: ai-secretary-webhook.service triggers on GitHub push.
Local-only files (not in git): .env, docker-compose.override.yml, modified Dockerfile, services/bridge/src/models/
- Run lint locally —
ruff check . && cd admin && npm run lint:check - Check for pending DB migrations
- Kill stale processes —
lsof -i :8002 - Clean build artifacts —
rm -rf admin/dist admin/node_modules/.vite - Build —
npm run build(verifyVITE_DEMO_MODEis NOT set) - Restart —
docker compose restart ai-secretary - Verify —
curl http://localhost:8002/health+ test/admin/auth/login
./deploy.sh # git pull, re-apply patches, build admin, restart orchestrator
./test_system.sh # Quick health checks and API smoke testsFully offline demo builds at demo.ai-sekretar24.ru:
- Full demo (
/full/):npm run build -- --mode demo(admin role, all features) - Cloud demo (
/cloud/):npm run build -- --mode demo-web(web role, customer-facing) - Deploy:
bash /root/deploy-demo.sh
Check in this order — infrastructure first, application logic last:
- Build artifacts — correct build deployed? Stale demo interceptors?
- Deploy pipeline — stale Vite cache, wrong
.env,VITE_DEMO_MODEleaking? - DB state — migrations applied?
sqlite3 data/secretary.db ".tables"/.schema - Process state — port conflicts (
lsof -i :8002), zombie processes? - Auth/JWT —
ADMIN_JWT_SECRETauto-generated on restart (invalidates tokens). Sessions validated againstuser_sessionstable viaSessionCache - Application logic — only after ruling out 1–5
This project is developed from two machines:
- local — dev workstation with GPU (RTX 3060), full stack
- server — Beget VPS (
root@155.212.231.7), systemd service, cloud LLM only
Each machine identifies itself via ~/.claude/projects/.../memory/MEMORY.md (## Machine Role section). Check your machine role before git operations.
- Never push directly to
main— always feature branch + PR - Branch prefixes:
local/*(dev machine),server/*(server), orfeat//fix//docs/with machine suffix - Always
git pullbefore starting work - Do not amend or force-push commits made by the other instance
Local primary: Hardware services (voice_clone_service.py, stt_service.py, vllm_llm_service.py, piper_tts_service.py), GPU/hardware routers, fine-tuning, voice samples, start scripts.
Server primary: Cloud services (cloud_llm_service.py, xray_proxy_manager.py), Docker files, bot operations (runtime), production data.
Shared (coordinate via branches): orchestrator.py, app/routers/, db/, admin/, CLAUDE.md, migrations (create new files only).
- Vosk model required — Download to
models/vosk/for STT - XTTS requires CUDA CC >= 7.0 — RTX 3060+; use OpenVoice for older GPUs
- GPU memory — vLLM ~6GB + XTTS ~5GB must fit in 12GB
- VLESS proxy vs localhost —
GeminiProvidersets globalHTTP_PROXY;OpenAICompatibleProvidersetsNO_PROXY=127.0.0.1,localhostforclaude_bridge;bridge_manager.pystrips proxy env vars - Claude bridge timeouts — 7-30s warmup.
read=300stimeout for bridge (vs 60s default).max_tokens=4096(vs 512) services/bridge/src/models/gitignored —.gitignorepatternmodels/catches it. Copy manually after clone- Docker CPU: whisper excluded —
openai-whisperfails to build (missingpkg_resources). Server Dockerfile patched togrep -v whisper - Docker + Claude CLI — CPU image needs Node.js. Server Dockerfile patched to install Node.js 20 +
@anthropic-ai/claude-code - Circular import in domain
__init__.py— Domain__init__.pyfiles MUST stay empty (no service re-exports). Chain:db/models.pyimportsfrom modules.X.models import ...→ Python executesmodules/X/__init__.py→ if it importsservice.py→service.pyimportsdb.repositories→db.repositoriesimportsdb.models→ circular. Workaround: import services directly (from modules.chat.service import ChatService). Future fix (Phase 3+): eliminate eager imports indb/models.pyby making it a lazy facade or removing it entirely once consumers import models from domain modules.