diff --git a/.gitignore b/.gitignore index 2fd0608..162eb9b 100644 --- a/.gitignore +++ b/.gitignore @@ -30,4 +30,5 @@ pipeline-config.txt *__pycache__* .coverage .agents/ -skills-lock.json \ No newline at end of file +skills-lock.json +.crawl_results.csv \ No newline at end of file diff --git a/README.md b/README.md index cd9c58b..74299f4 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,12 @@
- Site Audit — Open Source SEO Crawl & Audit
- Self-hosted technical SEO — your infrastructure, your data.
+ Site Audit — Developer-friendly SEO crawl & audit
+ Self-hosted technical SEO for developers — your infrastructure, your data.
@@ -37,15 +37,15 @@ # Site Audit -**Open-source SEO crawl and technical audit platform** — built with **Next.js, Python, and PostgreSQL**. +**Developer-friendly SEO audit platform** — open-source crawl and technical audit tooling built with **Next.js, Python, and PostgreSQL**. ## Overview -Site Audit is a self-hosted alternative to commercial SEO suites. It runs on your own infrastructure, stores data in your PostgreSQL database, and produces transparent technical reports — no subscription tiers, no gated exports. +Site Audit is a **developer-friendly SEO audit** tool: self-hosted, transparent, and built for engineers who want crawl data, issue reports, and integrations in their own stack — not another opaque SaaS dashboard. It runs on your infrastructure, stores data in PostgreSQL, and produces actionable technical reports with no subscription tiers or gated exports. **Use cases** -- Technical SEO audits for owned or client properties +- Developer-friendly SEO audits for owned or client properties - Crawl analysis with static and JavaScript rendering - Content writing and optimization with live SEO scoring - Search Console, GA4, and Bing Webmaster integration @@ -95,9 +95,7 @@ Site Audit focuses on **honest, self-hosted technical SEO**. It is not a drop-in Also included: **AI chat** over audit data (optional), **Content studio** (write & optimize with live SEO scoring), **340 MCP tools** (domain-scoped servers), image SEO, GEO/AEO readiness, keyword explorer (GSC + on-site), backlinks (GSC Links import), compare runs, and portfolio management for agencies. -
-
-
## Architecture
@@ -137,15 +135,17 @@ WebsiteProfiling/
└── pipeline-config.example.txt
```
-| Path | Purpose |
-|------|---------|
-| `src/website_profiling/` | Crawl, analyze, report, Lighthouse, integrations, AI — run via `python -m src` |
-| `web/app/api/` | REST APIs: report data, pipeline runs, chat (SSE), Google/Bing sync |
-| `web/src/lib/pipelineConfigSchema.ts` | Audit settings schema (UI ↔ PostgreSQL) |
-| `alembic/versions/` | Database migrations — run `./local-run migrate` |
-| `tests/` | Backend tests; `./local-test browser` for Playwright crawl integration |
-| `docs/MCP.md` | MCP server setup for IDE and agent integrations |
-| `data/` | Local secrets and shadow `pipeline-config.txt` (gitignored) |
+
+| Path | Purpose |
+| ------------------------------------- | ------------------------------------------------------------------------------ |
+| `src/website_profiling/` | Crawl, analyze, report, Lighthouse, integrations, AI — run via `python -m src` |
+| `web/app/api/` | REST APIs: report data, pipeline runs, chat (SSE), Google/Bing sync |
+| `web/src/lib/pipelineConfigSchema.ts` | Audit settings schema (UI ↔ PostgreSQL) |
+| `alembic/versions/` | Database migrations — run `./local-run migrate` |
+| `tests/` | Backend tests; `./local-test browser` for Playwright crawl integration |
+| `docs/MCP.md` | MCP server setup for IDE and agent integrations |
+| `data/` | Local secrets and shadow `pipeline-config.txt` (gitignored) |
+
For layout details and common development patterns, see [AGENT.md](AGENT.md).
@@ -179,11 +179,13 @@ Default local `DATABASE_URL`: `postgres://postgres:dev@127.0.0.1:5432/website_pr
### Pipeline job timeouts
-| Setting | Default | Description |
-|---------|---------|-------------|
-| `PIPELINE_JOB_STALE_HOURS` | 1 hour | Reconciles stuck `running` rows |
+
+| Setting | Default | Description |
+| ----------------------------- | --------- | ---------------------------------------------- |
+| `PIPELINE_JOB_STALE_HOURS` | 1 hour | Reconciles stuck `running` rows |
| `PIPELINE_JOB_ORPHAN_MINUTES` | 5 minutes | Clears orphan jobs with no live server process |
+
Increase `PIPELINE_JOB_STALE_HOURS` for crawls that routinely exceed one hour.
### Testing
@@ -213,16 +215,18 @@ In Audit settings, set **Crawl rendering** to `javascript` (always headless Chro
Ask questions about audit data at [http://localhost:3000/chat](http://localhost:3000/chat). Enable a provider under **Run audit → AI settings** (`llm_enabled`, provider, model). `./local-run setup` installs Python deps from `requirements.txt` (including `httpx`, OpenAI, Anthropic, and Groq SDKs; Gemini uses `httpx` via REST).
-| Provider | Notes |
-|----------|-------|
-| **Ollama** | Local daemon at `http://127.0.0.1:11434`. Chat UI lists installed models plus the live Ollama cloud catalog. Native tool calling when supported; ReAct fallback otherwise. |
-| **OpenAI** / **Anthropic** | API key in AI settings or env (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`); native tool calling with streaming. |
-| **Google Gemini** | API key in AI settings or `GEMINI_API_KEY`; REST via `httpx`. |
-| **Groq** | API key in AI settings or `GROQ_API_KEY`; official Groq Python SDK; native tool calling with streaming. Default model `openai/gpt-oss-120b`. |
+
+| Provider | Notes |
+| -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Ollama** | Local daemon at `http://127.0.0.1:11434`. Chat UI lists installed models plus the live Ollama cloud catalog. Native tool calling when supported; ReAct fallback otherwise. |
+| **OpenAI** / **Anthropic** | API key in AI settings or env (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`); native tool calling with streaming. |
+| **Google Gemini** | API key in AI settings or `GEMINI_API_KEY`; REST via `httpx`. |
+| **Groq** | API key in AI settings or `GROQ_API_KEY`; official Groq Python SDK; native tool calling with streaming. Default model `openai/gpt-oss-120b`. |
+
The agent uses the same **340 read-only audit tools** as the MCP server ([docs/MCP.md](docs/MCP.md)), with **dynamic routing** (~45 tools per turn). Responses stream over SSE (`POST /api/chat`). Sessions persist per property (`chat_sessions` / `chat_messages`).
-### Content studio (optional)
+### Content studio (optional, Experimental)
Write and optimize content at [http://localhost:3000/write](http://localhost:3000/write) with **live SEO scoring** from Search Console and on-page heuristics. Drafts persist per property; an optional AI assist (same providers as AI chat) drafts and rewrites copy. Backed by `/api/content-drafts`, `/api/content/score`, and `/api/content/analyze`.
@@ -235,14 +239,16 @@ Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and
## Documentation
-| Document | Description |
-|----------|-------------|
-| [docs/README.md](docs/README.md) | Documentation index and brand assets |
-| [AGENT.md](AGENT.md) | Repository layout and development commands |
-| [docs/GLOSSARY.md](docs/GLOSSARY.md) | UI terminology |
-| [docs/COMPANY_STANDARDS.md](docs/COMPANY_STANDARDS.md) | Data and security policy |
-| [docs/MCP.md](docs/MCP.md) | MCP server setup |
-| [docs/OPS.md](docs/OPS.md) | Scheduled audits, alerts, production ops |
+
+| Document | Description |
+| ------------------------------------------------------ | ------------------------------------------ |
+| [docs/README.md](docs/README.md) | Documentation index and brand assets |
+| [AGENT.md](AGENT.md) | Repository layout and development commands |
+| [docs/GLOSSARY.md](docs/GLOSSARY.md) | UI terminology |
+| [docs/COMPANY_STANDARDS.md](docs/COMPANY_STANDARDS.md) | Data and security policy |
+| [docs/MCP.md](docs/MCP.md) | MCP server setup |
+| [docs/OPS.md](docs/OPS.md) | Scheduled audits, alerts, production ops |
+
## Star History
@@ -250,6 +256,4 @@ Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for setup and
## License
-Copyright © 2026 [codefrydev](https://github.com/codefrydev). Released under the [MIT License](LICENSE).
-
-Issues and pull requests: [codefrydev/WebsiteProfiling](https://github.com/codefrydev/WebsiteProfiling)
+Copyright © 2026 [codefrydev](https://github.com/codefrydev). Released under the [MIT License](LICENSE).
\ No newline at end of file
diff --git a/crawl_results.csv b/crawl_results.csv
deleted file mode 100644
index 1d05b30..0000000
--- a/crawl_results.csv
+++ /dev/null
@@ -1,2 +0,0 @@
-url,status
-https://a.com,200
diff --git a/docs/assets/readme-banner.png b/docs/assets/readme-banner.png
index 4add873..c068749 100644
Binary files a/docs/assets/readme-banner.png and b/docs/assets/readme-banner.png differ
diff --git a/docs/assets/social-preview.png b/docs/assets/social-preview.png
index 1075859..afdfc5f 100644
Binary files a/docs/assets/social-preview.png and b/docs/assets/social-preview.png differ