MCP Server Reference

Site Audit exposes 340 read-only tools via the Model Context Protocol (MCP). Connect from Cursor, Claude Desktop, or any MCP-compatible client to query audit data programmatically.

The same tool catalog powers in-app AI Chat at /chat.

Related documentation: GLOSSARY.md · Documentation index

Prerequisites
Domain-scoped servers
Configuration
Remote Streamable HTTP
MCP resources
Tool reference
In-app chat
Provider notes
Roadmap
Example prompts

Prerequisites

pip install -r requirements.txt
export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profiling  # Docker default
# ./local-run default: postgres://postgres:dev@127.0.0.1:5432/website_profiling
export PYTHONPATH=src

Start the local stdio server:

python -m website_profiling.mcp

For remote access over HTTP, see Remote Streamable HTTP.

Domain-scoped servers

Rather than loading all 340 tools in a single server, Site Audit supports domain-scoped bundles. Connect only the domains relevant to your workflow.

`WP_MCP_DOMAIN`	Tool count	Scope	Recommended use
`core` (default)	Tier 0 + `core`/`insight` domains	Router, workflows, insight	General queries, tool search, coverage reports
`crawl`	Domain subset	Crawl, on-page, schema, accessibility	Technical crawl audits
`google`	Domain subset	Google, insight, CTR, keywords	GSC/GA4 analysis
`links`	Domain subset	Links, backlinks, indexation	Link architecture
`full`	340	All tools	Debugging, legacy single-server setup

Tier 0 alone includes 16 router/insight tools (TIER_0_TOOLS in tool_domains.py). Use the audit://tools resource or WP_MCP_DOMAIN=full for the complete catalog.

Set WP_PROPERTY_ID to the default property when tools omit an explicit property_id argument.

Configuration

Multi-domain setup (recommended)

Add to .cursor/mcp.json or your MCP client settings:

{
  "mcpServers": {
    "site-audit-core": {
      "command": "python",
      "args": ["-m", "website_profiling.mcp"],
      "env": {
        "DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
        "PYTHONPATH": "src",
        "WP_MCP_DOMAIN": "core",
        "WP_PROPERTY_ID": "1"
      }
    },
    "site-audit-google": {
      "command": "python",
      "args": ["-m", "website_profiling.mcp"],
      "env": {
        "DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
        "PYTHONPATH": "src",
        "WP_MCP_DOMAIN": "google",
        "WP_PROPERTY_ID": "1"
      }
    }
  }
}

Single-server setup (all tools)

{
  "mcpServers": {
    "site-audit": {
      "command": "python",
      "args": ["-m", "website_profiling.mcp"],
      "env": {
        "DATABASE_URL": "postgres://profiling:profiling@localhost:5432/website_profiling",
        "PYTHONPATH": "src",
        "WP_MCP_DOMAIN": "full",
        "WP_PROPERTY_ID": "1"
      }
    }
  }
}

Remote Streamable HTTP

Use this when Site Audit runs on a hosted server and your MCP client (Cursor, Claude Desktop, etc.) connects over the network instead of spawning a local stdio subprocess.

Start the HTTP server

Configure access on MCP settings (/mcp) in the web UI (recommended), or set environment variables. UI changes apply on the next MCP request without restarting the service.

export DATABASE_URL=postgres://profiling:profiling@localhost:5432/website_profiling
export PYTHONPATH=src
export WP_MCP_HTTP_HOST=0.0.0.0
export WP_MCP_HTTP_PORT=8000
export WP_MCP_DOMAIN=core
export WP_PROPERTY_ID=1

python -m website_profiling.mcp.http

Set MCP bearer token and Allowed hostnames on the Secrets page (or via WP_MCP_TOKEN / WP_MCP_ALLOWED_HOSTS). Environment variables override saved values when set.

The MCP endpoint is http://<host>:8000/mcp by default (WP_MCP_HTTP_PATH=/mcp).

Environment variables

Variable	Default	Purpose
`WP_MCP_HTTP_HOST`	`127.0.0.1`	Bind address (`0.0.0.0` for Docker)
`WP_MCP_HTTP_PORT`	`8000`	Listen port
`WP_MCP_HTTP_PATH`	`/mcp`	Mount path
`WP_MCP_TOKEN`	unset	Bearer token (required when not binding localhost). Save on Secrets → Remote MCP or set here (env wins).
`WP_MCP_ALLOWED_HOSTS`	unset	Comma-separated `Host` allowlist (required for non-localhost bind). Save on Secrets → Remote MCP or set here.
`WP_MCP_ALLOWED_ORIGINS`	unset	Comma-separated `Origin` allowlist for browser clients
`WP_MCP_JSON_RESPONSE`	`false`	JSON responses instead of SSE streams
`WP_MCP_DOMAIN`	`core`	Tool bundle (same as stdio)
`WP_PROPERTY_ID`	unset	Default property (same as stdio)

Security: WP_MCP_TOKEN is required when WP_MCP_HTTP_HOST is not localhost. Tools are read-only but expose audit, GSC, and GA4 data — treat the token like a database credential.

DNS rebinding protection: Whenever a token and allowed hosts are configured — via the UI or environment variables — the HTTP service enforces the bearer token plus the Host/Origin allowlist in its own middleware, and the MCP SDK's built-in DNS-rebinding check is turned off (the middleware supersedes it). The SDK check only applies as a fallback on a non-localhost bind that has no remote access configured (a state the startup validation otherwise refuses to boot in). Either way, set WP_MCP_ALLOWED_HOSTS to the public hostname clients use (e.g. audit.example.com).

Cursor / Claude Desktop (remote)

{
  "mcpServers": {
    "site-audit-remote": {
      "url": "https://audit.example.com/mcp",
      "headers": {
        "Authorization": "Bearer your-long-random-token"
      }
    }
  }
}

Docker (production)

The mcp service in docker-compose.prod.yml includes an mcp service. Set WP_MCP_TOKEN and WP_MCP_ALLOWED_HOSTS in the environment or configure them on Secrets → Remote MCP after deploy.

Terminate TLS at your reverse proxy and route /mcp to the MCP container. Recommended proxy settings: proxy_buffering off, long proxy_read_timeout.

Troubleshooting

Symptom	Likely cause
Server refuses to start	Missing token or allowed hosts on non-localhost bind (Secrets page or env)
401 Unauthorized	Wrong or missing `Authorization: Bearer` header
404 Not Found	Wrong path — endpoint is `/mcp` by default
Blocked / bad request behind proxy	Host not listed in allowed hostnames (Secrets → Remote MCP)
Connection refused	Firewall, wrong port, or MCP service not running

MCP resources

URI	Content
`audit://properties`	JSON list of properties
`audit://property/{id}`	Property details and latest report summary
`audit://property/{id}/report/latest`	Payload key index (counts, not full blob)
`audit://property/{id}/report/{report_id}`	Payload key index for a specific report
`audit://glossary`	Excerpt from GLOSSARY.md
`audit://tools`	Tool catalog for the connected `WP_MCP_DOMAIN`
`audit://domains`	Available MCP domain bundles and tool groupings

Tool reference

All tools are read-only. This section is a curated subset of the 340-tool registry. For the complete catalog, connect with WP_MCP_DOMAIN=full or read the audit://tools MCP resource.

Export tools write artifact files with a 24-hour TTL; in-app chat renders download buttons via /api/chat/artifacts/{id}.

Router and insight (Tier 0 — included in every chat turn)

search_audit_tools, list_tool_domains, get_data_coverage_report, run_insight_workflow, run_technical_workflow, run_keyword_workflow, run_domain_agent, get_report_summary, list_top_impact_issues, prioritize_fix_roadmap, get_landing_page_blended_table, get_opportunity_matrix, get_traffic_health_check, get_landing_page_full_diagnosis, get_issue_to_traffic_map, get_google_summary

Export and deliverables

export_audit_report, export_compare_csv, export_list_as_csv, export_sitemap_xml, validate_rich_results, list_export_formats

Full audit exports use the same generators as the Export view. PDF export requires reportlab.

Image audit

get_image_audit_summary, list_pages_without_lazy_images, list_pages_with_images_missing_dimensions, list_site_image_urls, list_lighthouse_image_opportunities, list_largest_images, list_unoptimized_images, list_images_needing_attention

Size-based tools require probe_image_inventory=true in pipeline config. Related keys: max_image_probe_urls (default 500), image_probe_concurrency, image_probe_timeout, image_unoptimized_min_kb (default 200).

Portfolio and report

list_properties, get_property, get_report_summary, get_category_scores, get_executive_summary, get_report_meta, get_site_level, list_report_history, get_audit_recommendations, get_ml_errors, get_ssl_expiry_info, list_audit_categories, get_category_recommendations, get_crawl_summary, get_portfolio_summary

Issues and workflow

list_issues, search_issues, list_top_impact_issues, prioritize_fix_roadmap, list_issues_by_category, get_category_issues, list_issue_workflow, list_issues_with_ai_fixes, generate_issue_fix, summarize_category_for_client, list_seo_onpage_issues

On-page SEO

list_content_url_issues, list_pages_missing_title, list_pages_missing_h1, list_pages_multiple_h1, list_pages_missing_meta_description, list_pages_meta_desc_too_short, list_pages_meta_desc_too_long, list_pages_noindex, get_seo_health, list_pages_missing_canonical, list_canonical_mismatch, list_pages_with_missing_alt, list_pages_skipped_headings, list_pages_missing_viewport, list_pages_missing_og_image

Crawl and pages

search_pages, search_pages_advanced, get_page_details, get_page_analysis, get_internal_links, list_redirects, list_broken_links, list_status_4xx_pages, list_status_5xx_pages, list_pages_soft_404, list_dead_end_pages, list_duplicate_title_groups, list_heavy_pages_by_bytes, list_pages_poor_cache_headers, list_pages_low_content_ratio, get_heading_outline_for_url, get_status_code_breakdown, get_response_time_stats, get_depth_distribution, get_crawl_segments, get_browser_diagnostics_summary, list_pages_with_console_errors, list_pages_by_fetch_method, get_crawl_links_table, get_graph_edges_sample, list_long_redirect_chains, list_robots_blocked_urls, get_top_pages_by_pagerank, get_pagination_audit_summary, get_js_rendering_delta

Accessibility and assets

list_pages_with_axe_violations, get_axe_audit_summary, list_pages_with_mixed_content, get_asset_weight_summary, get_readability_summary

Rich results and portfolio extras

get_rich_results_summary, list_rich_results_failures, get_competitor_keyword_gap, get_portfolio_benchmark, get_site_anchor_text_summary

Schema and technical

get_schema_coverage, list_pages_without_schema, search_pages_by_schema_type, get_tech_stack_summary, list_pages_by_technology, get_security_findings, get_security_findings_summary, list_security_findings_by_type

Links and architecture

get_link_rel_summary, get_inlink_anchors, list_nofollow_internal_links, list_orphan_pages, get_top_linked_pages, get_top_crawled_pages, get_outbound_link_domains, get_link_graph_summary, get_url_fingerprints, list_broken_link_sources, get_mime_type_breakdown, get_title_length_distribution, get_domain_link_distribution, get_outlink_distribution

Indexation and international

get_indexation_coverage, list_indexation_gaps, get_indexation_url_join, get_hreflang_summary, get_language_summary

Content and social

get_content_analytics, get_content_duplicates, get_duplicate_cluster, get_social_coverage, get_keyword_opportunities, get_ner_site_summary, list_thin_content_pages

Keywords

get_keyword_summary, search_keywords, get_striking_distance_keywords, get_keyword_cannibalisation, get_query_page_misalignment, get_semantic_keyword_clusters, get_keyword_history, get_keyword_serp_overlay, get_serp_feature_overlay, list_keywords_by_action, list_keywords_by_position, list_keywords_by_impressions, list_keywords_ctr_opportunity, expand_keywords, generate_content_brief, get_brand_keyword_split, list_keywords_by_intent

Google and CTR

get_google_summary, get_google_integration_status, get_gsc_top_queries, get_gsc_top_pages, get_gsc_ctr_opportunity_pages, get_ga4_summary, get_ga4_page_metrics, get_gsc_page_query_slice, get_gsc_url_inspection, get_gsc_index_coverage, analyze_serp_snippet_for_url, get_gsc_daily_trend, get_ga4_daily_trend, get_ga4_by_device, get_ga4_by_channel, get_gsc_page_queries

Backlinks

get_gsc_links_summary, get_gsc_links_import_status, get_gsc_sample_links, get_gsc_latest_links, get_third_party_links_overlay, get_backlinks_velocity, get_competitor_link_gap, get_bing_backlinks_summary

Performance

get_lighthouse_summary, get_lighthouse_for_url, get_lighthouse_human_summary, get_lighthouse_diagnostics, get_crux_summary, list_slow_pages, list_lighthouse_poor_seo_pages, list_lighthouse_poor_accessibility_pages, list_lighthouse_poor_best_practices_pages, list_lighthouse_cwv_failures

Drift, health, and compare

get_health_history, get_category_health_history, compare_reports, compare_issue_deltas, compare_category_deltas, compare_seo_health_deltas, compare_lighthouse_deltas, compare_url_set_diff, compare_redirect_deltas, compare_link_metric_deltas, compare_security_deltas, compare_duplicate_deltas, compare_tech_deltas, compare_content_metrics, compare_google_metrics, compare_priority_counts, compare_health_score_delta, compare_indexation_deltas, compare_orphan_deltas

GEO / AEO

get_geo_readiness_score, get_aeo_content_signals_for_url, get_llms_txt_status, draft_llms_txt, get_faq_schema_coverage, list_pages_missing_faq_schema, get_eeat_signals_summary, get_internal_link_suggestions, check_ai_citation_presence

Integrations

get_bing_index_status (requires bing_webmaster_api_key in audit settings)

Operations and logs

get_integration_alerts, get_property_ops, list_crawl_runs, list_log_uploads, get_latest_log_analysis, get_log_top_paths, list_log_only_paths, list_crawl_only_paths, get_log_googlebot_stats, get_log_analysis_by_id, get_page_coach

In-app chat

The same tools power AI Chat at http://localhost:3000/chat. Enable a provider under Run audit → AI settings.

In-app chat uses dynamic tool routing: each turn loads Tier 0 router tools plus a domain-scoped subset (default ~45 tools via CHAT_TOOL_MAX). Set CHAT_TOOL_MODE=full to load all tools for debugging. Optional: CHAT_TOOL_MAX (default 45, max 120).

Responses stream over SSE via POST /api/chat. Sessions persist per property in chat_sessions and chat_messages.

Provider notes

Provider	Tool calling	Notes
Ollama	Native when supported; ReAct fallback otherwise	Local daemon at `http://127.0.0.1:11434`
OpenAI	Native with streaming	API key in AI settings or `OPENAI_API_KEY`
Anthropic	Native with streaming	API key in AI settings or `ANTHROPIC_API_KEY`
Google Gemini	Native with streaming	API key in AI settings or `GEMINI_API_KEY`; REST via `httpx`
Groq	Native with streaming	API key in AI settings or `GROQ_API_KEY`; official Groq Python SDK; default model `openai/gpt-oss-120b`

Roadmap

The following capabilities are planned but not yet available:

Capability	Current state
Full backlink index and anchor-text analytics	GSC Links CSV import only
SERP rank tracking	GSC position snapshots only
Live AI citation checks	On-site heuristics via `check_ai_citation_presence`

Already available: validate_rich_results, get_gsc_url_inspection, export_sitemap_xml, workbook export, axe audits via enable_axe on browser crawls.

Example prompts

Goal	Example prompt
Indexation	"What indexation gaps exist between crawl and GSC?"
On-page	"List pages missing canonical tags or with canonical mismatches"
Log analysis	"Which paths appear in access logs but were not crawled?"
Google data	"Compare GSC clicks vs the previous audit"
Performance	"List pages failing Core Web Vitals thresholds"
Security	"Show security finding changes since report 38"
Links	"Which pages link to broken URLs?"
Content	"Generate a content brief for keyword X"
Export	"Download the audit as PDF"
Compare	"Compare report 38 to the current audit and give me a CSV diff"
Client report	"Build a client report with executive summary, category scores, and top critical issues as PDF"
Images	"Which images are largest and unoptimized?"
Prioritization	"What should we fix first on high-traffic pages?"
GEO	"What's our GEO readiness score?"
GSC inspection	"Inspect GSC indexing for https://example.com/page"
Crawl quality	"Which pages are soft 404s or dead ends?"
Internal links	"Suggest internal links for our top blog post"
Accessibility	"List pages with images missing alt or lazy loading"

FilesExpand file tree

MCP.md

Latest commit

History