Skip to content
Merged

MCP #37

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,4 +31,4 @@ pipeline-config.txt
.coverage
.agents/
skills-lock.json
.crawl_results.csv
crawl_results.csv
6 changes: 3 additions & 3 deletions AGENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,11 +27,11 @@ Developer reference for agents and contributors. User-facing overview: [README.m
- **Pipeline storage** (crawl, edges, nodes, report payload, Lighthouse, keywords, warnings) lives in **PostgreSQL only**. Deliverables use the Export view, `GET /api/report/export`, or MCP `export_*` tools — not files written by the main pipeline step.
- **Pool tuning:** `DB_POOL_MIN` / `DB_POOL_MAX` (Python), `PGPOOL_MAX` (Node). Bulk crawl writes via `executemany`; optional **`crawl_stream_to_db`** streams rows during fetch. Per-URL raw HTML: `crawl_page_html` table (migration `015`); API `GET/POST /api/crawl/page-html` (localhost).
- **`web/` APIs:** `/api/report/*` read routes (payload, meta, history — not localhost-guarded; protect with `AUTH_*` when exposed); `/api/run` spawns Python (localhost); `/api/jobs`, `/api/jobs/[id]`, `/api/jobs/[id]/cancel` (localhost); `/api/crawl/browser-status`, `/api/crawl/page-html` (localhost); `/api/pipeline-config` GET/PUT; `/api/llm-config` GET/PUT; `/api/chat` POST (SSE); `/api/chat/sessions` GET/POST; `/api/ollama/status` (localhost); `/api/properties/{id}/google/links/import` POST; `PipelineRunnerFab` saves pipeline + LLM state before each run. Full route list: `web/app/api/**/route.ts`.
- **MCP:** `python -m website_profiling.mcp` (stdio, **340 read-only audit tools**, domain-scoped via `WP_MCP_DOMAIN`). See `docs/MCP.md`. Requires `pip install -r requirements.txt`.
- **MCP:** `python -m website_profiling.mcp` (stdio) or `python -m website_profiling.mcp.http` (remote Streamable HTTP). Configure at **`/mcp`** in the web UI. See `docs/MCP.md`.
- **AI Chat UI:** `/chat` — property-scoped chat with saved sessions (`chat_sessions`, `chat_messages`; migration `012_chat_sessions`).
- **Job store:** PostgreSQL `pipeline_jobs` when `DATABASE_URL` is set (`pipelineJobsDb.ts` — status, timestamps, truncated logs). In-memory map in `pipelineJobs.ts` holds live log tail and child process handles; stale rows reconciled via `PIPELINE_JOB_STALE_HOURS`.
- **Schema head:** `015_crawl_page_html` (recent: `013` link_edges/discovery, `014` job log truncation, `015` per-URL HTML storage).
- **Docker:** `Dockerfile` + `docker-compose.yml` (postgres + web); **`docker-compose.prod.yml`** (production); **`docker-compose.pull.yml`** for pre-built images (`WEB_IMAGE`); **`LIGHTHOUSE_CHROME_FLAGS`**
- **Docker:** `Dockerfile` + `docker-compose.yml` (postgres + web); **`docker-compose.prod.yml`** (production + remote MCP on `:8000`); **`docker-compose.pull.yml`** for pre-built images (`WEB_IMAGE`); **`LIGHTHOUSE_CHROME_FLAGS`**

**Where to edit**

Expand All @@ -42,7 +42,7 @@ Developer reference for agents and contributors. User-facing overview: [README.m
| DB schema | `alembic/versions/` |
| Local analysis | `analysis/local.py`, `requirements.txt` |
| AI insights (LLM) | `llm/enrich.py`, `llm/agent.py`, `llm_config.py`, `requirements.txt` |
| Audit query tools (MCP + chat) | `tools/audit_tools/`, `mcp/server.py`, `commands/chat_cmd.py` |
| Audit query tools (MCP + chat) | `tools/audit_tools/`, `mcp/server.py`, `mcp/http_server.py`, `commands/chat_cmd.py` |
| Config / CLI | `config.py` (`load_config`, `load_config_from_db`), `cli.py`, `input.txt.example` |
| UI pipeline schema | `web/src/lib/pipelineConfigSchema.ts` |
| UI LLM schema | `web/src/lib/llmConfigSchema.ts` |
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in
</tr>
</table>

Also included: **AI chat** over audit data (optional), **Content studio** (write &amp; optimize with live SEO scoring), **340 MCP tools** (domain-scoped servers), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies.
Also included: **AI chat** over audit data (optional), **Content studio** (write &amp; optimize with live SEO scoring), **340 MCP tools** (local stdio or remote Streamable HTTP), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies.

<img src="docs/assets/social-preview.png" alt="Site Audit — developer-friendly SEO audit preview" width="100%">

Expand All @@ -109,7 +109,7 @@ WebsiteProfiling/
│ ├── integrations/ # Google Search Console, GA4, Bing, CrUX
│ ├── llm/ # AI enrich + chat agent
│ ├── tools/ # Exports, audit query tools, MCP helpers
│ ├── mcp/ # MCP server (340 read-only tools, domain bundles)
│ ├── mcp/ # MCP server (stdio + remote HTTP, domain bundles)
│ ├── db/ # PostgreSQL storage layer
│ ├── commands/ # CLI subcommands
│ ├── cli.py # Pipeline entrypoint
Expand Down
21 changes: 21 additions & 0 deletions docker-compose.prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,27 @@ services:
profiles:
- worker

mcp:
build:
context: .
dockerfile: Dockerfile
depends_on:
postgres:
condition: service_healthy
command: ['python', '-m', 'website_profiling.mcp.http']
environment:
WEBSITE_PROFILING_ROOT: /app
DATABASE_URL: postgres://${POSTGRES_USER}:${POSTGRES_PASSWORD}@postgres:5432/${POSTGRES_DB:-website_profiling}
WP_MCP_HTTP_HOST: 0.0.0.0
WP_MCP_HTTP_PORT: 8000
WP_MCP_TOKEN: ${WP_MCP_TOKEN:?set WP_MCP_TOKEN}
WP_MCP_ALLOWED_HOSTS: ${WP_MCP_ALLOWED_HOSTS:-}
WP_MCP_ALLOWED_ORIGINS: ${WP_MCP_ALLOWED_ORIGINS:-}
WP_MCP_DOMAIN: ${WP_MCP_DOMAIN:-core}
WP_PROPERTY_ID: ${WP_PROPERTY_ID:-}
ports:
- '${MCP_PORT:-8000}:8000'

volumes:
pg-data:
profiling-data:
21 changes: 21 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,27 @@ services:
retries: 3
start_period: 15s

# Optional remote MCP (Streamable HTTP). Uncomment and set WP_MCP_TOKEN / WP_MCP_ALLOWED_HOSTS.
# mcp:
# build:
# context: .
# dockerfile: Dockerfile
# image: website-profiling:latest
# depends_on:
# postgres:
# condition: service_healthy
# command: ['python', '-m', 'website_profiling.mcp.http']
# environment:
# WEBSITE_PROFILING_ROOT: /app
# DATABASE_URL: postgres://profiling:profiling@postgres:5432/website_profiling
# WP_MCP_HTTP_HOST: 0.0.0.0
# WP_MCP_HTTP_PORT: 8000
# WP_MCP_TOKEN: ${WP_MCP_TOKEN:-dev-mcp-token}
# WP_MCP_ALLOWED_HOSTS: localhost,127.0.0.1
# WP_MCP_DOMAIN: core
# ports:
# - "8000:8000"

volumes:
pg-data:
profiling-data:
79 changes: 78 additions & 1 deletion docs/MCP.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ The same tool catalog powers in-app **AI Chat** at `/chat`.
- [Prerequisites](#prerequisites)
- [Domain-scoped servers](#domain-scoped-servers)
- [Configuration](#configuration)
- [Remote Streamable HTTP](#remote-streamable-http)
- [MCP resources](#mcp-resources)
- [Tool reference](#tool-reference)
- [In-app chat](#in-app-chat)
Expand All @@ -31,12 +32,14 @@ export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profil
export PYTHONPATH=src
```

Start the server:
Start the local stdio server:

```bash
python -m website_profiling.mcp
```

For remote access over HTTP, see [Remote Streamable HTTP](#remote-streamable-http).

---

## Domain-scoped servers
Expand Down Expand Up @@ -111,6 +114,80 @@ Add to `.cursor/mcp.json` or your MCP client settings:

---

## Remote Streamable HTTP

Use this when Site Audit runs on a hosted server and your MCP client (Cursor, Claude Desktop, etc.) connects over the network instead of spawning a local stdio subprocess.

### Start the HTTP server

Configure access on **MCP settings** (`/mcp`) in the web UI (recommended), or set environment variables. UI changes apply on the next MCP request without restarting the service.

```bash
export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profiling
export PYTHONPATH=src
export WP_MCP_HTTP_HOST=0.0.0.0
export WP_MCP_HTTP_PORT=8000
export WP_MCP_DOMAIN=core
export WP_PROPERTY_ID=1

python -m website_profiling.mcp.http
```

Set **MCP bearer token** and **Allowed hostnames** on the Secrets page (or via `WP_MCP_TOKEN` / `WP_MCP_ALLOWED_HOSTS`). Environment variables override saved values when set.

The MCP endpoint is `http://<host>:8000/mcp` by default (`WP_MCP_HTTP_PATH=/mcp`).

### Environment variables

| Variable | Default | Purpose |
|----------|---------|---------|
| `WP_MCP_HTTP_HOST` | `127.0.0.1` | Bind address (`0.0.0.0` for Docker) |
| `WP_MCP_HTTP_PORT` | `8000` | Listen port |
| `WP_MCP_HTTP_PATH` | `/mcp` | Mount path |
| `WP_MCP_TOKEN` | unset | Bearer token (**required** when not binding localhost). Save on **Secrets → Remote MCP** or set here (env wins). |
| `WP_MCP_ALLOWED_HOSTS` | unset | Comma-separated `Host` allowlist (**required** for non-localhost bind). Save on **Secrets → Remote MCP** or set here. |
| `WP_MCP_ALLOWED_ORIGINS` | unset | Comma-separated `Origin` allowlist for browser clients |
| `WP_MCP_JSON_RESPONSE` | `false` | JSON responses instead of SSE streams |
| `WP_MCP_DOMAIN` | `core` | Tool bundle (same as stdio) |
| `WP_PROPERTY_ID` | unset | Default property (same as stdio) |

**Security:** `WP_MCP_TOKEN` is required when `WP_MCP_HTTP_HOST` is not localhost. Tools are read-only but expose audit, GSC, and GA4 data — treat the token like a database credential.

**DNS rebinding protection:** Whenever a token **and** allowed hosts are configured — via the UI **or** environment variables — the HTTP service enforces the bearer token plus the Host/Origin allowlist in its own middleware, and the MCP SDK's built-in DNS-rebinding check is turned off (the middleware supersedes it). The SDK check only applies as a fallback on a non-localhost bind that has no remote access configured (a state the startup validation otherwise refuses to boot in). Either way, set `WP_MCP_ALLOWED_HOSTS` to the public hostname clients use (e.g. `audit.example.com`).

### Cursor / Claude Desktop (remote)

```json
{
"mcpServers": {
"site-audit-remote": {
"url": "https://audit.example.com/mcp",
"headers": {
"Authorization": "Bearer your-long-random-token"
}
}
}
}
```

### Docker (production)

The `mcp` service in `docker-compose.prod.yml` includes an `mcp` service. Set `WP_MCP_TOKEN` and `WP_MCP_ALLOWED_HOSTS` in the environment **or** configure them on **Secrets → Remote MCP** after deploy.

Terminate TLS at your reverse proxy and route `/mcp` to the MCP container. Recommended proxy settings: `proxy_buffering off`, long `proxy_read_timeout`.

### Troubleshooting

| Symptom | Likely cause |
|---------|----------------|
| Server refuses to start | Missing token or allowed hosts on non-localhost bind (Secrets page or env) |
| 401 Unauthorized | Wrong or missing `Authorization: Bearer` header |
| 404 Not Found | Wrong path — endpoint is `/mcp` by default |
| Blocked / bad request behind proxy | Host not listed in allowed hostnames (Secrets → Remote MCP) |
| Connection refused | Firewall, wrong port, or MCP service not running |

---

## MCP resources

| URI | Content |
Expand Down
14 changes: 14 additions & 0 deletions docs/OPS.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,20 @@ AUTH_DEFAULT_ROLE=client-readonly

Production also requires `AUTH_SECRET` and optionally `AUTH_USER` / `AUTH_PASSWORD` (see `docker-compose.prod.yml`).

### Remote MCP (Streamable HTTP)

The `mcp` service in `docker-compose.prod.yml` exposes read-only audit tools over HTTP at `/mcp`. Configure on **Secrets → Remote MCP** (`/secrets`) or via environment variables (env overrides saved values):

| Variable | Purpose |
|----------|---------|
| `WP_MCP_TOKEN` | Bearer token for MCP clients (`Authorization: Bearer …`) |
| `WP_MCP_ALLOWED_HOSTS` | Public hostname allowlist (e.g. `audit.example.com`) |
| `WP_MCP_ALLOWED_ORIGINS` | Optional `Origin` allowlist |
| `WP_MCP_DOMAIN` | Tool bundle (`core` recommended for remote) |
| `MCP_PORT` | Host port mapped to container `8000` (default `8000`) |

Terminate TLS at your reverse proxy; do not expose plain HTTP publicly. Configure token and allowed hostnames on **Secrets → Remote MCP** (`/secrets`, Remote MCP section).

### Read-only client dashboards

Set `AUTH_DEFAULT_ROLE=client-readonly` so session logins cannot run audits or save settings. The API returns 403 on mutations; the UI hides **Run audit** and disables save controls. Use `viewer` instead if chat access should also be blocked.
Expand Down
6 changes: 4 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,10 @@ groq==1.4.0
pyspellchecker==0.9.0
html5lib==1.1

# MCP server for Cursor / Claude Desktop
mcp~=1.0.0
# MCP server for Cursor / Claude Desktop (stdio + remote Streamable HTTP)
mcp>=1.19,<2
uvicorn>=0.30
starlette>=0.38

# Dev / test
pytest==9.0.3
Expand Down
5 changes: 5 additions & 0 deletions src/website_profiling/mcp/http.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""Entry point: python -m website_profiling.mcp.http"""
from .http_server import main

if __name__ == "__main__":
main()
Loading
Loading