Study Agent

A local AI learning assistant with long-term memory, role-based group chat, web search, model routing and context-tier management.

One-minute Overview

Study Agent 是一个本地优先的 AI 学习助手，重点不是简单调用大模型，而是把 LLM 接入完整应用流程：

多 Provider LLM 接入：OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models
长期记忆：Markdown memory + safe writer
上下文分层：fast / light / deep / archive
联网搜索：RSS / News fetch → article extraction → LLM digest → source tracing
RAG MVP：本地 Markdown / TXT / DOCX / PDF 索引、关键词 / 本地向量原型 / hybrid / backend-vector 检索、可配置 embedding provider、可选 Chroma 持久化、受控本地知识检索工具、引用上下文、来源块、Streamlit 检索/调试面板、聊天注入和 FastAPI RAG / chat / memory 基础接口
工程安全：SSRF protection、detect-secrets、配置模板
工程质量：pytest 测试套件、Ruff、GitHub Actions CI、打包检查

Highlights

Multi-provider LLM client: OpenAI / DeepSeek / OpenRouter / SiliconFlow / local models
Model routing with fast / light / deep / archive context tiers
Long-term memory based on Markdown files and safe-writer persistence
Web search pipeline: feed registry → URL safety checks → article extraction → LLM digest → auditable source trace
RAG MVP: local Markdown / TXT / DOCX / PDF indexing, lexical / local vector prototype / hybrid / backend-vector retrieval, configurable embedding providers, optional Chroma persistence, a controlled local-knowledge retrieval tool, citation-first context formatting, source blocks, a Streamlit retrieval/debug panel, optional chat injection, FastAPI RAG / chat / memory / tools / workflows foundation endpoints, and a first React / Vite / TypeScript console
SSRF protection for article fetching, detect-secrets in CI
Batched session logging and multi-layer caching for performance
Performance budget: mode-based max_tokens bounds on the main chat, WeChat, and news LLM paths
314 pytest tests, Ruff clean, mypy clean, GitHub Actions CI workflow

For a detailed breakdown of the stack and engineering highlights, see Technical Stack & Engineering Highlights.

一个面向个人学习复盘的本地 AI 学习搭子系统 — 支持角色群聊、联网搜索、长期记忆和课后总结。

不是又一个 AI 问答工具，而是一个会记住你学什么的 AI 学习伙伴。

为什么做这个

通用 AI 对话工具擅长回答问题，但不擅长「陪伴学习」：

它们不记得你昨天学了什么、上周卡在了哪里
它们不会主动帮你总结学习进展
它们没有「角色感」—— 严肃还是轻松？鼓励还是挑战？全看随机

Study Agent 的定位很明确：一个运行在你本地的、有长期记忆的、有角色区分的 AI 学习搭子。它会记住你的学习轨迹，在群聊中用不同角色和你讨论，课后自动总结进展，并把新的知识写进长期记忆。

Why It Is Not Just a Prompt Demo

普通 AI demo 通常只是把用户输入转发给模型。Study Agent 重点解决的是：

问题	工程方案
模型供应商更换困难	Provider profile + OpenAI-compatible client
上下文越来越长	context-tier routing
学习记录无法沉淀	Markdown long-term memory
写入记忆不安全	safe writer + preview/confirm
联网内容不可追溯	source-traced news pipeline
运行不稳定	caching, batched logging, tests, CI

Demo

界面	截图
首页 — 状态看板、当前重点、版本信息
微信群聊 — 三位角色群内讨论
联网搜索 — 多源新闻聚合与来源追溯
记忆候选 — 课后更新预览与确认写入

启动 App  →  选择学习模式 (氛围/专注度)
    │
    ├── 单人对话 ──→ 提问/讨论 ──→ 课后总结 ──→ 记忆更新
    │
    └── 微信群聊 ──→ 生成开场 / 聊新闻 / 查资料
                        │
                    ┌────┴────┐
                    │         │
                联网搜索    角色互动讨论
                    │         │
               来源追溯写入  观点碰撞
                    │         │
                    └────┬────┘
                        │
                   课后总结 → 确认 → 写入长期记忆

核心功能

功能	说明
单人对话	与 AI 一对一讨论学习内容，支持 flash/pro 模型切换
角色群聊	四位角色（三月七、刻晴、纳西妲、流萤）群聊讨论，各有独立人设
联网搜索	Google News + Bing News + RSSHub 多源聚合，页面正文三层提取
来源追溯	搜索结果写入群聊记录，可回溯依据
RAG MVP	本地 Markdown / TXT / DOCX / PDF 文档索引，前端面板返回带文件路径、行号、分数、命中词和 score breakdown 的引用片段，并可注入单人聊天和微信群互动回复；FastAPI 提供 `/health`、`/rag`、`/rag/index`、`/rag/query`、`/rag/status`、`/rag/upload`、`/rag/local-knowledge`
课后总结	学习完成后自动总结进展，用户确认后写入记忆
长期记忆	学习者画像、进度追踪、项目上下文、当前焦点，多级记忆档案
多 Provider	支持 OpenAI / DeepSeek / OpenRouter / SiliconFlow / 本地模型
氛围选择	warm / close / standard 多种互动氛围切换

架构

streamlit run app.py
       │
┌──────┴──────┐
│   app.py    │  Streamlit 入口，路由到各 UI 面板
└──────┬──────┘
       │
┌──────┴──────────────────────────────────────────┐
│  src/ui/                                        │
│  ├── main_panel.py     主页                     │
│  ├── chat_panel.py     对话面板                 │
│  ├── wechat_panel.py   微信群面板               │
│  ├── after_session_panel.py  课后总结面板       │
│  └── sidebar.py        侧边栏                   │
└──────┬──────────────────────────────────────────┘
       │
┌──────┴──────┬──────────────┬──────────────┬──────────────┐
│  LLM Layer  │  News Layer  │  Memory     │  WeChat     │
│             │              │  Layer      │  Layer      │
│ llm_client  │ news/        │ memory.py   │ wechat_*.py │
│ llm_router  │ ├─rss_fetc  │ memory_tools │ (format,    │
│ context_bui │ ├─article_e │ memory_writer│ state,      │
│ -ilder      │ ├─link_reso │              │ generator,  │
│             │ ├─digest    │ session_log  │ prompt)     │
│ config.py   │ └─article_f │ -ger         │             │
│ router.py   │   etcher    │              │ wechat_serv│
│             │             │              │ -ice.py     │
└──────┬──────┴──────┬──────┴──────┬───────┴──────┬───────┘
       │             │             │              │
  .env.example   chat/        memory/         roles/
  (5 providers)  (群聊记录)   (记忆文件)      (角色人设)

快速开始

git clone <repo-url> study-agent
cd study-agent
cp .env.example .env
# 编辑 .env，填入 API Key

# 初始化记忆文件（新用户首次运行，应用会自动创建；也可手动复制模板）
cp -r memory.example/* memory/ 2>/dev/null || :

# 稳定安装（推荐，锁定版本）
pip install -r requirements.txt
pip install -r requirements-dev.txt

streamlit run app.py

浏览器打开 http://localhost:8501

依赖管理

本项目使用 pip-tools 管理依赖：

requirements.in / requirements-dev.in — 人类维护，写范围版本
requirements.txt / requirements-dev.txt — 自动生成，写精确版本（lock 文件）

修改依赖后重新生成 lock 文件：

pip install pip-tools
pip-compile requirements.in        # 重新锁定主依赖
pip-compile requirements-dev.in    # 重新锁定开发依赖

环境配置

通过 LLM_PROVIDER_PROFILE 切换 LLM 提供商（openai / deepseek / openrouter / siliconflow / local），每个 provider 读写独立的环境变量：

Provider	环境变量前缀	默认 Base URL
`deepseek`	`DEEPSEEK_*`	`https://api.deepseek.com/v1`
`openrouter`	`OPENROUTER_*`	`https://openrouter.ai/api/v1`
`siliconflow`	`SILICONFLOW_*`	`https://api.siliconflow.cn/v1`
`local`	`LOCAL_*`	`http://127.0.0.1:8000/v1`
`openai`	`OPENAI_*`	—

参数优先级：代码显式参数 → 任务级环境变量 → 任务默认值 → 全局环境变量 → provider 级环境变量。完整配置见 .env.example 和用户指南。

RAG 向量后端默认使用 local，不需要额外服务；可选 chroma adapter 需要用户自行安装 chromadb。Embedding provider 默认 local_hash，生产检索可显式切到 OpenAI-compatible embeddings：

RAG_VECTOR_BACKEND=local
# RAG_VECTOR_BACKEND=chroma
# RAG_CHROMA_PATH=logs/chroma
# RAG_CHROMA_COLLECTION=study_agent

RAG_EMBEDDING_PROVIDER=local_hash
# RAG_EMBEDDING_PROVIDER=openai
# RAG_EMBEDDING_MODEL=text-embedding-3-small
# RAG_EMBEDDING_DIMENSIONS=1536
# RAG_EMBEDDING_API_KEY=...

项目结构

├── app.py                  # Streamlit 入口
├── src/
│   ├── llm_client.py       # LLM 调用（chat / stream）
│   ├── llm_router.py       # 模型路由分发
│   ├── context_builder.py  # 上下文构建
│   ├── mode_manager.py     # 模式管理（版本/性能/氛围）
│   ├── api.py              # FastAPI health / chat / memory / sessions / RAG / tools / workflows endpoints
│   ├── role_manager.py     # 角色加载与管理
│   ├── performance_budget.py # 性能预算（max_tokens 分级）
│   ├── memory.py           # 记忆系统
│   ├── memory_tools.py     # 记忆工具
│   ├── memory_writer.py    # 记忆写入
│   ├── wechat_format.py    # 群聊文本格式化
│   ├── wechat_state.py     # 群聊 I/O、状态管理
│   ├── wechat_generator.py # LLM 生成逻辑
│   ├── wechat_prompt.py    # Prompt 模板加载
│   ├── wechat_memory.py    # 群聊记忆提取
│   ├── after_session.py    # 课后总结
│   ├── session_logger.py   # 会话日志
│   ├── config.py           # 全局配置
│   ├── router.py           # 路由配置
│   ├── news/               # 新闻聚合链路
│   ├── rag/                # 本地 RAG MVP：加载、分块、索引、关键词/向量原型/embedding/可选后端检索
│   ├── tools/              # 受控工具边界：本地知识检索等
│   └── ui/                 # Streamlit UI 组件
├── tests/                  # pytest 测试套件
├── frontend/               # React + Vite + TypeScript console
├── docs/                   # 设计文档与工程说明
│   ├── TECH_STACK.md       # 技术栈与项目亮点
│   ├── RAG.md              # RAG MVP 状态与边界
│   └── STATE_MODEL.md      # 状态模型
├── chat/                   # 群聊记录
├── memory/                 # AI 长期记忆
├── roles/                  # 角色人设
├── templates/              # Prompt 模板
├── config/                 # YAML 配置
├── requirements.in         # 依赖声明（范围版本）
└── assets/                 # 视觉资源

测试

pytest tests/ -v            # current local baseline: 314 passed
pytest tests/ --cov=src     # 覆盖率
ruff check src/ tests/      # linting
mypy --explicit-package-bases src/  # type check

CI 通过 GitHub Actions 在 push / pull request 上运行，集成 pytest、ruff、打包检查、detect-secrets 扫描，以及 mypy soft check。当前验证状态见 docs/TESTING.md。

版本历史

v0.8.0 — 文档同步 + UI 中文标签 + 工程收口

文档版本同步（5 份文档统一升级）；UI 中文标签（模型/性能/状态栏全中文）；合并性能预算系统、依赖锁定、状态模型文档化、CI 门禁升级、入口页新闻流程修复。当前验证状态见 docs/TESTING.md。

v0.7.8 — 性能预算 + 状态模型 + 工程收口

v0.7.7 — 模块拆分与服务层解耦

新闻链路拆分为 4 个专注模块 + 兼容门面；服务层直连子模块；UI 逐阶段新闻流；SSRF 安全加固；Session logger 自动 flush 保护。112 tests，Ruff clean。

v0.7.6 — 工程安全与新闻链路收口

完整历史见 CHANGELOG.md。

Roadmap

版本	方向
v0.8.1	稳定性和 UI 打磨
v0.9	知识库 / RAG 能力
v0.10	多语言支持、导出增强
v1.0	插件化架构 + 自定义角色

Engineering Roadmap

求职导向的技术演进路线：

FastAPI service layer foundation: /health, /chat, /memory/preview, /memory/commit, /sessions, /rag, /rag/index, /rag/query, /rag/status, /rag/upload, /rag/local-knowledge, /tools and /workflows/runs implemented; streaming, auth and frontend-specific contracts remain planned
RAG MVP: Markdown / TXT / DOCX / PDF loading, chunking, local keyword retrieval, local vector prototype, hybrid retrieval, backend-vector retrieval, configurable embedding provider, optional Chroma adapter, controlled local-knowledge retrieval, citation context, source blocks, Streamlit retrieval panel, optional single-chat and WeChat interactive injection
RAG document QA (partial): PDF parsing has file-size, page-count, extracted-text and encrypted-file guards; production embedding requires explicit API/env configuration and Chroma remains optional
Vector store: Chroma optional adapter implemented; FAISS local prototype and pgvector engineering version remain planned
P8.4 evaluation sets foundation: retrieval, answer grounding, tool routing, workflow events and safety regression cases before expanding agentic behavior
P8.5 execution foundation: workflow run / step / event JSONL timeline plus controlled local-knowledge tool use behind typed schemas, permissions and audit logs
P9 web UI: first React + Vite + TypeScript console implemented with chat, source, workflow, tool and memory panels; streaming chat, upload flow, richer timeline details and memory/tool confirmation UI remain planned
P10 hardening and integration: auth, CORS, Docker, OpenAPI examples, optional read-only MCP server, trace_id, token usage, latency and provider fallback logs
P11 optional RPA: browser automation as a future read-first adapter for no-API learning systems, gated by domain allowlists and human confirmation

许可

仅供个人学习使用。

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github/workflows		.github/workflows
assets		assets
changelog		changelog
config		config
docs		docs
frontend		frontend
memory.example		memory.example
memory		memory
roles		roles
src		src
templates		templates
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
COMPREHENSIVE_PROJECT.md		COMPREHENSIVE_PROJECT.md
CONTRIBUTING.md		CONTRIBUTING.md
FUTURE.md		FUTURE.md
LICENSE		LICENSE
PROJECT_PLAN.md		PROJECT_PLAN.md
README.md		README.md
README703.md		README703.md
README_internal_modes.md		README_internal_modes.md
USER_GUIDE.md		USER_GUIDE.md
app.py		app.py
requirements-dev.in		requirements-dev.in
requirements-dev.txt		requirements-dev.txt
requirements.in		requirements.in
requirements.txt		requirements.txt
多人讨论.png		多人讨论.png
探索.docx		探索.docx
架构图，用于readme等.png		架构图，用于readme等.png
群聊候选.png		群聊候选.png
群聊新闻.png		群聊新闻.png
首页截图.png		首页截图.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Study Agent

One-minute Overview

Highlights

为什么做这个

Why It Is Not Just a Prompt Demo

Demo

核心功能

架构

快速开始

依赖管理

环境配置

项目结构

测试

版本历史

v0.8.0 — 文档同步 + UI 中文标签 + 工程收口

v0.7.8 — 性能预算 + 状态模型 + 工程收口

v0.7.7 — 模块拆分与服务层解耦

v0.7.6 — 工程安全与新闻链路收口

Roadmap

Engineering Roadmap

许可

About

Uh oh!

Releases 11

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Study Agent

One-minute Overview

Highlights

为什么做这个

Why It Is Not Just a Prompt Demo

Demo

核心功能

架构

快速开始

依赖管理

环境配置

项目结构

测试

版本历史

v0.8.0 — 文档同步 + UI 中文标签 + 工程收口

v0.7.8 — 性能预算 + 状态模型 + 工程收口

v0.7.7 — 模块拆分与服务层解耦

v0.7.6 — 工程安全与新闻链路收口

Roadmap

Engineering Roadmap

许可

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages