Site Audit exposes 340 read-only tools via the Model Context Protocol (MCP). Connect from Cursor, Claude Desktop, or any MCP-compatible client to query audit data programmatically.
The same tool catalog powers in-app AI Chat at /chat.
Related documentation: GLOSSARY.md · Documentation index
- Prerequisites
- Domain-scoped servers
- Configuration
- Remote Streamable HTTP
- MCP resources
- Tool reference
- In-app chat
- Provider notes
- Roadmap
- Example prompts
pip install -r requirements.txt
export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profiling # Docker default
# ./local-run default: postgres://postgres:dev@127.0.0.1:5432/website_profiling
export PYTHONPATH=srcStart the local stdio server:
python -m website_profiling.mcpFor remote access over HTTP, see Remote Streamable HTTP.
Rather than loading all 340 tools in a single server, Site Audit supports domain-scoped bundles. Connect only the domains relevant to your workflow.
WP_MCP_DOMAIN |
Tool count | Scope | Recommended use |
|---|---|---|---|
core (default) |
Tier 0 + core/insight domains |
Router, workflows, insight | General queries, tool search, coverage reports |
crawl |
Domain subset | Crawl, on-page, schema, accessibility | Technical crawl audits |
google |
Domain subset | Google, insight, CTR, keywords | GSC/GA4 analysis |
links |
Domain subset | Links, backlinks, indexation | Link architecture |
full |
340 | All tools | Debugging, legacy single-server setup |
Tier 0 alone includes 16 router/insight tools (TIER_0_TOOLS in tool_domains.py). Use the audit://tools resource or WP_MCP_DOMAIN=full for the complete catalog.
Set WP_PROPERTY_ID to the default property when tools omit an explicit property_id argument.
Add to .cursor/mcp.json or your MCP client settings:
{
"mcpServers": {
"site-audit-core": {
"command": "python",
"args": ["-m", "website_profiling.mcp"],
"env": {
"DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
"PYTHONPATH": "src",
"WP_MCP_DOMAIN": "core",
"WP_PROPERTY_ID": "1"
}
},
"site-audit-google": {
"command": "python",
"args": ["-m", "website_profiling.mcp"],
"env": {
"DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
"PYTHONPATH": "src",
"WP_MCP_DOMAIN": "google",
"WP_PROPERTY_ID": "1"
}
}
}
}{
"mcpServers": {
"site-audit": {
"command": "python",
"args": ["-m", "website_profiling.mcp"],
"env": {
"DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
"PYTHONPATH": "src",
"WP_MCP_DOMAIN": "full",
"WP_PROPERTY_ID": "1"
}
}
}
}Use this when Site Audit runs on a hosted server and your MCP client (Cursor, Claude Desktop, etc.) connects over the network instead of spawning a local stdio subprocess.
Configure access on MCP settings (/mcp) in the web UI (recommended), or set environment variables. UI changes apply on the next MCP request without restarting the service.
export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profiling
export PYTHONPATH=src
export WP_MCP_HTTP_HOST=0.0.0.0
export WP_MCP_HTTP_PORT=8000
export WP_MCP_DOMAIN=core
export WP_PROPERTY_ID=1
python -m website_profiling.mcp.httpSet MCP bearer token and Allowed hostnames on the Secrets page (or via WP_MCP_TOKEN / WP_MCP_ALLOWED_HOSTS). Environment variables override saved values when set.
The MCP endpoint is http://<host>:8000/mcp by default (WP_MCP_HTTP_PATH=/mcp).
| Variable | Default | Purpose |
|---|---|---|
WP_MCP_HTTP_HOST |
127.0.0.1 |
Bind address (0.0.0.0 for Docker) |
WP_MCP_HTTP_PORT |
8000 |
Listen port |
WP_MCP_HTTP_PATH |
/mcp |
Mount path |
WP_MCP_TOKEN |
unset | Bearer token (required when not binding localhost). Save on Secrets → Remote MCP or set here (env wins). |
WP_MCP_ALLOWED_HOSTS |
unset | Comma-separated Host allowlist (required for non-localhost bind). Save on Secrets → Remote MCP or set here. |
WP_MCP_ALLOWED_ORIGINS |
unset | Comma-separated Origin allowlist for browser clients |
WP_MCP_JSON_RESPONSE |
false |
JSON responses instead of SSE streams |
WP_MCP_DOMAIN |
core |
Tool bundle (same as stdio) |
WP_PROPERTY_ID |
unset | Default property (same as stdio) |
Security: WP_MCP_TOKEN is required when WP_MCP_HTTP_HOST is not localhost. Tools are read-only but expose audit, GSC, and GA4 data — treat the token like a database credential.
DNS rebinding protection: Whenever a token and allowed hosts are configured — via the UI or environment variables — the HTTP service enforces the bearer token plus the Host/Origin allowlist in its own middleware, and the MCP SDK's built-in DNS-rebinding check is turned off (the middleware supersedes it). The SDK check only applies as a fallback on a non-localhost bind that has no remote access configured (a state the startup validation otherwise refuses to boot in). Either way, set WP_MCP_ALLOWED_HOSTS to the public hostname clients use (e.g. audit.example.com).
{
"mcpServers": {
"site-audit-remote": {
"url": "https://audit.example.com/mcp",
"headers": {
"Authorization": "Bearer your-long-random-token"
}
}
}
}The mcp service in docker-compose.prod.yml includes an mcp service. Set WP_MCP_TOKEN and WP_MCP_ALLOWED_HOSTS in the environment or configure them on Secrets → Remote MCP after deploy.
Terminate TLS at your reverse proxy and route /mcp to the MCP container. Recommended proxy settings: proxy_buffering off, long proxy_read_timeout.
| Symptom | Likely cause |
|---|---|
| Server refuses to start | Missing token or allowed hosts on non-localhost bind (Secrets page or env) |
| 401 Unauthorized | Wrong or missing Authorization: Bearer header |
| 404 Not Found | Wrong path — endpoint is /mcp by default |
| Blocked / bad request behind proxy | Host not listed in allowed hostnames (Secrets → Remote MCP) |
| Connection refused | Firewall, wrong port, or MCP service not running |
| URI | Content |
|---|---|
audit://properties |
JSON list of properties |
audit://property/{id} |
Property details and latest report summary |
audit://property/{id}/report/latest |
Payload key index (counts, not full blob) |
audit://property/{id}/report/{report_id} |
Payload key index for a specific report |
audit://glossary |
Excerpt from GLOSSARY.md |
audit://tools |
Tool catalog for the connected WP_MCP_DOMAIN |
audit://domains |
Available MCP domain bundles and tool groupings |
All tools are read-only. This section is a curated subset of the 340-tool registry. For the complete catalog, connect with WP_MCP_DOMAIN=full or read the audit://tools MCP resource.
Export tools write artifact files with a 24-hour TTL; in-app chat renders download buttons via /api/chat/artifacts/{id}.
search_audit_tools, list_tool_domains, get_data_coverage_report, run_insight_workflow, run_technical_workflow, run_keyword_workflow, run_domain_agent, get_report_summary, list_top_impact_issues, prioritize_fix_roadmap, get_landing_page_blended_table, get_opportunity_matrix, get_traffic_health_check, get_landing_page_full_diagnosis, get_issue_to_traffic_map, get_google_summary
export_audit_report, export_compare_csv, export_list_as_csv, export_sitemap_xml, validate_rich_results, list_export_formats
Full audit exports use the same generators as the Export view. PDF export requires reportlab.
get_image_audit_summary, list_pages_without_lazy_images, list_pages_with_images_missing_dimensions, list_site_image_urls, list_lighthouse_image_opportunities, list_largest_images, list_unoptimized_images, list_images_needing_attention
Size-based tools require probe_image_inventory=true in pipeline config. Related keys: max_image_probe_urls (default 500), image_probe_concurrency, image_probe_timeout, image_unoptimized_min_kb (default 200).
list_properties, get_property, get_report_summary, get_category_scores, get_executive_summary, get_report_meta, get_site_level, list_report_history, get_audit_recommendations, get_ml_errors, get_ssl_expiry_info, list_audit_categories, get_category_recommendations, get_crawl_summary, get_portfolio_summary
list_issues, search_issues, list_top_impact_issues, prioritize_fix_roadmap, list_issues_by_category, get_category_issues, list_issue_workflow, list_issues_with_ai_fixes, generate_issue_fix, summarize_category_for_client, list_seo_onpage_issues
list_content_url_issues, list_pages_missing_title, list_pages_missing_h1, list_pages_multiple_h1, list_pages_missing_meta_description, list_pages_meta_desc_too_short, list_pages_meta_desc_too_long, list_pages_noindex, get_seo_health, list_pages_missing_canonical, list_canonical_mismatch, list_pages_with_missing_alt, list_pages_skipped_headings, list_pages_missing_viewport, list_pages_missing_og_image
search_pages, search_pages_advanced, get_page_details, get_page_analysis, get_internal_links, list_redirects, list_broken_links, list_status_4xx_pages, list_status_5xx_pages, list_pages_soft_404, list_dead_end_pages, list_duplicate_title_groups, list_heavy_pages_by_bytes, list_pages_poor_cache_headers, list_pages_low_content_ratio, get_heading_outline_for_url, get_status_code_breakdown, get_response_time_stats, get_depth_distribution, get_crawl_segments, get_browser_diagnostics_summary, list_pages_with_console_errors, list_pages_by_fetch_method, get_crawl_links_table, get_graph_edges_sample, list_long_redirect_chains, list_robots_blocked_urls, get_top_pages_by_pagerank, get_pagination_audit_summary, get_js_rendering_delta
list_pages_with_axe_violations, get_axe_audit_summary, list_pages_with_mixed_content, get_asset_weight_summary, get_readability_summary
get_rich_results_summary, list_rich_results_failures, get_competitor_keyword_gap, get_portfolio_benchmark, get_site_anchor_text_summary
get_schema_coverage, list_pages_without_schema, search_pages_by_schema_type, get_tech_stack_summary, list_pages_by_technology, get_security_findings, get_security_findings_summary, list_security_findings_by_type
get_link_rel_summary, get_inlink_anchors, list_nofollow_internal_links, list_orphan_pages, get_top_linked_pages, get_top_crawled_pages, get_outbound_link_domains, get_link_graph_summary, get_url_fingerprints, list_broken_link_sources, get_mime_type_breakdown, get_title_length_distribution, get_domain_link_distribution, get_outlink_distribution
get_indexation_coverage, list_indexation_gaps, get_indexation_url_join, get_hreflang_summary, get_language_summary
get_content_analytics, get_content_duplicates, get_duplicate_cluster, get_social_coverage, get_keyword_opportunities, get_ner_site_summary, list_thin_content_pages
get_keyword_summary, search_keywords, get_striking_distance_keywords, get_keyword_cannibalisation, get_query_page_misalignment, get_semantic_keyword_clusters, get_keyword_history, get_keyword_serp_overlay, get_serp_feature_overlay, list_keywords_by_action, list_keywords_by_position, list_keywords_by_impressions, list_keywords_ctr_opportunity, expand_keywords, generate_content_brief, get_brand_keyword_split, list_keywords_by_intent
get_google_summary, get_google_integration_status, get_gsc_top_queries, get_gsc_top_pages, get_gsc_ctr_opportunity_pages, get_ga4_summary, get_ga4_page_metrics, get_gsc_page_query_slice, get_gsc_url_inspection, get_gsc_index_coverage, analyze_serp_snippet_for_url, get_gsc_daily_trend, get_ga4_daily_trend, get_ga4_by_device, get_ga4_by_channel, get_gsc_page_queries
get_gsc_links_summary, get_gsc_links_import_status, get_gsc_sample_links, get_gsc_latest_links, get_third_party_links_overlay, get_backlinks_velocity, get_competitor_link_gap, get_bing_backlinks_summary
get_lighthouse_summary, get_lighthouse_for_url, get_lighthouse_human_summary, get_lighthouse_diagnostics, get_crux_summary, list_slow_pages, list_lighthouse_poor_seo_pages, list_lighthouse_poor_accessibility_pages, list_lighthouse_poor_best_practices_pages, list_lighthouse_cwv_failures
get_health_history, get_category_health_history, compare_reports, compare_issue_deltas, compare_category_deltas, compare_seo_health_deltas, compare_lighthouse_deltas, compare_url_set_diff, compare_redirect_deltas, compare_link_metric_deltas, compare_security_deltas, compare_duplicate_deltas, compare_tech_deltas, compare_content_metrics, compare_google_metrics, compare_priority_counts, compare_health_score_delta, compare_indexation_deltas, compare_orphan_deltas
get_geo_readiness_score, get_aeo_content_signals_for_url, get_llms_txt_status, draft_llms_txt, get_faq_schema_coverage, list_pages_missing_faq_schema, get_eeat_signals_summary, get_internal_link_suggestions, check_ai_citation_presence
get_bing_index_status (requires bing_webmaster_api_key in audit settings)
get_integration_alerts, get_property_ops, list_crawl_runs, list_log_uploads, get_latest_log_analysis, get_log_top_paths, list_log_only_paths, list_crawl_only_paths, get_log_googlebot_stats, get_log_analysis_by_id, get_page_coach
The same tools power AI Chat at http://localhost:3000/chat. Enable a provider under Run audit → AI settings.
In-app chat uses dynamic tool routing: each turn loads Tier 0 router tools plus a domain-scoped subset (default ~45 tools via CHAT_TOOL_MAX). Set CHAT_TOOL_MODE=full to load all tools for debugging. Optional: CHAT_TOOL_MAX (default 45, max 120).
Responses stream over SSE via POST /api/chat. Sessions persist per property in chat_sessions and chat_messages.
| Provider | Tool calling | Notes |
|---|---|---|
| Ollama | Native when supported; ReAct fallback otherwise | Local daemon at http://127.0.0.1:11434 |
| OpenAI | Native with streaming | API key in AI settings or OPENAI_API_KEY |
| Anthropic | Native with streaming | API key in AI settings or ANTHROPIC_API_KEY |
| Google Gemini | Native with streaming | API key in AI settings or GEMINI_API_KEY; REST via httpx |
| Groq | Native with streaming | API key in AI settings or GROQ_API_KEY; official Groq Python SDK; default model openai/gpt-oss-120b |
The following capabilities are planned but not yet available:
| Capability | Current state |
|---|---|
| Full backlink index and anchor-text analytics | GSC Links CSV import only |
| SERP rank tracking | GSC position snapshots only |
| Live AI citation checks | On-site heuristics via check_ai_citation_presence |
Already available: validate_rich_results, get_gsc_url_inspection, export_sitemap_xml, workbook export, axe audits via enable_axe on browser crawls.
| Goal | Example prompt |
|---|---|
| Indexation | "What indexation gaps exist between crawl and GSC?" |
| On-page | "List pages missing canonical tags or with canonical mismatches" |
| Log analysis | "Which paths appear in access logs but were not crawled?" |
| Google data | "Compare GSC clicks vs the previous audit" |
| Performance | "List pages failing Core Web Vitals thresholds" |
| Security | "Show security finding changes since report 38" |
| Links | "Which pages link to broken URLs?" |
| Content | "Generate a content brief for keyword X" |
| Export | "Download the audit as PDF" |
| Compare | "Compare report 38 to the current audit and give me a CSV diff" |
| Client report | "Build a client report with executive summary, category scores, and top critical issues as PDF" |
| Images | "Which images are largest and unoptimized?" |
| Prioritization | "What should we fix first on high-traffic pages?" |
| GEO | "What's our GEO readiness score?" |
| GSC inspection | "Inspect GSC indexing for https://example.com/page" |
| Crawl quality | "Which pages are soft 404s or dead ends?" |
| Internal links | "Suggest internal links for our top blog post" |
| Accessibility | "List pages with images missing alt or lazy loading" |