Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .coveragerc.reporting
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
source = website_profiling.reporting
omit =
*/website_profiling/reporting/builder.py
*/website_profiling/reporting/builder_sections/*

[report]
show_missing = True
Expand Down
17 changes: 9 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,18 +37,17 @@

# Site Audit

**Open-source SEO crawl and technical audit platform** — self-hosted UI built with **Next.js, Python, and PostgreSQL**.

Repository: [codefrydev/WebsiteProfiling](https://github.com/codefrydev/WebsiteProfiling)
**Open-source SEO crawl and technical audit platform** — built with **Next.js, Python, and PostgreSQL**.

## Overview

Site Audit is a self-hosted alternative to commercial SEO audit tools. It runs on your infrastructure, stores data in your PostgreSQL database, and produces transparent technical reports without subscription tiers or gated exports.
Site Audit is a self-hosted alternative to commercial SEO suites. It runs on your own infrastructure, stores data in your PostgreSQL database, and produces transparent technical reports — no subscription tiers, no gated exports.

**Use cases**

- Technical SEO audits for owned or client properties
- Crawl analysis with static and JavaScript rendering
- Content writing and optimization with live SEO scoring
- Search Console, GA4, and Bing Webmaster integration
- Agency portfolio management and run comparison
- Optional AI-assisted analysis over audit data via MCP-compatible tools
Expand Down Expand Up @@ -79,7 +78,7 @@ Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in
<td align="center" width="25%">
<img src="docs/assets/icon-audit.svg" width="48" alt=""><br>
<strong>Technical audit</strong><br>
<sub>Issues, Lighthouse, on-page checks, workbooks</sub>
<sub>Issues, Lighthouse, accessibility (axe), on-page checks</sub>
</td>
<td align="center" width="25%">
<img src="docs/assets/icon-integrations.svg" width="48" alt=""><br>
Expand All @@ -94,7 +93,7 @@ Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in
</tr>
</table>

Also included: **AI chat** over audit data (optional), **340 MCP tools** (domain-scoped servers), keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies.
Also included: **AI chat** over audit data (optional), **Content studio** (write &amp; optimize with live SEO scoring), **340 MCP tools** (domain-scoped servers), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies.

<p align="center">
<img src="docs/assets/social-preview.png" alt="Site Audit preview" width="640">
Expand Down Expand Up @@ -147,8 +146,6 @@ WebsiteProfiling/
| `tests/` | Backend tests; `./local-test browser` for Playwright crawl integration |
| `docs/MCP.md` | MCP server setup for IDE and agent integrations |
| `data/` | Local secrets and shadow `pipeline-config.txt` (gitignored) |
| `docker-compose.prod.yml` | Production stack (`POSTGRES_USER`, `POSTGRES_PASSWORD`, `AUTH_SECRET`) |
| `docker-compose.pull.yml` | Pre-built `WEB_IMAGE` deployment |

For layout details and common development patterns, see [AGENT.md](AGENT.md).

Expand Down Expand Up @@ -225,6 +222,10 @@ Ask questions about audit data at [http://localhost:3000/chat](http://localhost:

The agent uses the same **340 read-only audit tools** as the MCP server ([docs/MCP.md](docs/MCP.md)), with **dynamic routing** (~45 tools per turn). Responses stream over SSE (`POST /api/chat`). Sessions persist per property (`chat_sessions` / `chat_messages`).

### Content studio (optional)

Write and optimize content at [http://localhost:3000/write](http://localhost:3000/write) with **live SEO scoring** from Search Console and on-page heuristics. Drafts persist per property; an optional AI assist (same providers as AI chat) drafts and rewrites copy. Backed by `/api/content-drafts`, `/api/content/score`, and `/api/content/analyze`.

## Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and pull request guidelines.
Expand Down
12 changes: 12 additions & 0 deletions src/website_profiling/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,17 @@
_wappalyzer_disabled = _tech._wappalyzer_disabled


def strip_www_prefix(host: str) -> str:
"""Remove a single leading ``www.`` label from a host.

Use this instead of ``host.lstrip("www.")`` — ``str.lstrip`` strips any
leading characters in the *set* ``{'w', '.'}``, so e.g.
``"www.washington.edu".lstrip("www.")`` wrongly yields ``"ashington.edu"``.
"""
h = host or ""
return h[4:] if h.lower().startswith("www.") else h


def detect_tech_wappalyzer(url, html, headers, soup, wappalyzer=None):
"""Detect technologies; syncs wappalyzer module state with this facade for tests."""
_tech._wappalyzer_disabled = _wappalyzer_disabled
Expand All @@ -37,6 +48,7 @@ def detect_tech_wappalyzer(url, html, headers, soup, wappalyzer=None):
"load_edges",
"save_edges",
"strip_crawl_query_params",
"strip_www_prefix",
"normalize_link",
"parse_link_edges",
"parse_links",
Expand Down
7 changes: 6 additions & 1 deletion src/website_profiling/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,12 @@ def get_str(cfg: dict, key: str, default: str = "") -> str:


def get_bool(cfg: dict, key: str, default: bool = False) -> bool:
return str(cfg.get(key, default)).lower() in ("true", "1", "yes")
raw = cfg.get(key)
# Missing or empty value falls back to the default (consistent with get_int/get_float);
# an empty string must not silently disable a default-on flag.
if raw is None or str(raw).strip() == "":
return default
return str(raw).strip().lower() in ("true", "1", "yes")


def get_int(cfg: dict, key: str, default: int | None = None) -> int | None:
Expand Down
10 changes: 10 additions & 0 deletions src/website_profiling/content_studio/ai_suggest.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,16 @@ def _rule_suggestions(score: dict[str, Any]) -> list[dict[str, Any]]:
"type": "term",
"source": "rule",
})
elif term.get("status") == "included":
count = int(term.get("count") or 0)
target = int(term.get("target") or 0)
if target and count < target and term.get("importance") == "high":
items.append({
"text": f"Use “{term.get('term')}” {target - count} more time(s) ({count}/{target}) to fully cover it.",
"priority": "low",
"type": "term",
"source": "rule",
})
for check in score.get("checks") or []:
if isinstance(check, dict) and not check.get("pass"):
items.append({
Expand Down
Loading
Loading