A production-ready template for building AI agent backends with FastAPI and LangGraph. Handles the hard parts — stateful conversations, long-term memory, tool calling, observability, rate limiting, auth — so you can focus on your agent logic.
Built for AI engineers who want a solid foundation, not a tutorial project.
- LangGraph stateful agent with checkpointing, tool calling, and human-in-the-loop support
- Long-term memory via mem0 + pgvector — semantic search per user, cache-backed
- LLM service with circular model fallback, exponential backoff retries, and total timeout budget
- Langfuse tracing on all LLM calls; Prometheus metrics + Grafana dashboards
- JWT auth with session management; rate limiting via slowapi
- Alembic migrations; optional Valkey/Redis cache layer
- Structured logging with request/session/user context on every line
git clone <repo-url> my-agent && cd my-agent
cp .env.example .env.development # fill in your keys
make install
make docker-up # starts API + PostgreSQLOpen http://localhost:8000/docs to see the interactive API.
For local development without Docker see docs/getting-started.md.
| Guide | What it covers |
|---|---|
| Getting Started | Prerequisites, local setup, first API call |
| Architecture | System design, request flow, component diagrams |
| Configuration | All environment variables with defaults |
| Authentication | JWT flow, sessions, endpoint reference |
| Database & Migrations | Schema, Alembic migrations, pgvector |
| LLM Service | Models, retries, fallback, timeout budget |
| Memory | mem0 long-term memory, cache layer |
| Observability | Langfuse, structured logging, Prometheus, profiling |
| Evaluation | Eval framework, custom metrics, reports |
| Docker | Docker, Compose, full monitoring stack |
app/
api/v1/ # Route handlers
core/
langgraph/ # Agent graph + tools
prompts/ # System prompt template
cache.py # Valkey/Redis + in-memory fallback
config.py # Settings
middleware.py # Metrics, logging context, profiling
limiter.py # Rate limiting
models/ # SQLModel ORM models
schemas/ # Pydantic request/response schemas
services/ # LLM, database, memory services
alembic/ # Database migrations
evals/ # LLM evaluation framework
PRs welcome. Please read docs/getting-started.md to get your environment set up, then follow the coding conventions in AGENTS.md.
Report security issues privately — see SECURITY.md.
See LICENSE.
What is this template? A production-ready foundation for AI agent backends built on FastAPI + LangGraph. It bundles the components you'd otherwise wire up by hand: stateful conversations, long-term memory, tool calling, observability, rate limiting, and JWT auth.
How does this differ from a basic LangGraph setup? The base LangGraph quickstart stops at "agent runs locally". This template adds Alembic migrations, mem0 + pgvector long-term memory, Langfuse tracing, Prometheus + Grafana dashboards, JWT sessions, slowapi rate limiting, structured logging with per-request context, and a circular-fallback LLM service — production concerns you'd otherwise build separately.
Do I need Docker?
Recommended but not required. make docker-up starts the API + PostgreSQL together. For local-only setup see docs/getting-started.md.
Which LLM providers are supported?
Today: OpenAI only via the LLMRegistry in app/services/llm/registry.py. Multi-provider support (Anthropic, Google, OpenRouter) via LangChain's init_chat_model is planned — see #51. Configure your model via DEFAULT_LLM_MODEL in .env.development.
How do I configure long-term memory?
Long-term memory is self-hosted: mem0 runs in-process and persists into your existing PostgreSQL via pgvector — there is no separate mem0 cloud account or API key. You only need a working OPENAI_API_KEY (used for fact extraction + embeddings) and the pgvector extension enabled. See docs/memory.md for details.
How do I add a custom tool?
Drop a LangChain @tool-decorated function in app/core/langgraph/tools/ and register it in the tools list exported from that package. The agent picks it up on next start; no graph changes needed.
How does the LLM service handle failures?
Two layers: (1) per-call exponential-backoff retry via tenacity, (2) circular fallback — if the active model exhausts its retries, the service rotates to the next model in LLMRegistry and continues. A total timeout budget caps the whole call so latency stays bounded. See docs/llm-service.md.
Can I use this without Langfuse?
Yes. Set LANGFUSE_TRACING_ENABLED=false (or omit the Langfuse keys). The agent runs unchanged; structured logs still capture request/session/user context.
The API won't start
- Ensure PostgreSQL is running (
make docker-upbrings it up alongside the API) - Confirm
.env.developmentexists — copy from.env.exampleand fill in required keys - Apply migrations:
make migrate
Memory / semantic search returns nothing
- Verify the
pgvectorextension is enabled in your PostgreSQL instance - Confirm
OPENAI_API_KEYis valid (mem0 calls OpenAI for fact extraction + embeddings) - Check
LONG_TERM_MEMORY_MODELandLONG_TERM_MEMORY_EMBEDDER_MODELare set in.env.development
Rate limiting is too aggressive
Limits are defined in app/core/limiter.py (slowapi). Adjust per-route decorators or the default rate in that file. See docs/configuration.md for the related env vars.