Skip to content

feat: add bounded crawl discovery#146

Merged
chaliy merged 1 commit into
mainfrom
codex/agent-crawl-discovery
Jul 4, 2026
Merged

feat: add bounded crawl discovery#146
chaliy merged 1 commit into
mainfrom
codex/agent-crawl-discovery

Conversation

@chaliy

@chaliy chaliy commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

What

Add opt-in bounded same-origin crawl discovery for AI-agent workflows.

Why

Agents often need a small page map after a low-quality or sparse seed fetch. This gives them deterministic discovery without turning FetchKit into an unbounded crawler.

How

  • Add crawl and max_pages request fields
  • Add CrawlResult / CrawlPage response summaries
  • Fetch seed normally, then same-origin non-asset links up to a capped page budget
  • Fetch discovered pages with markdown + content_focus="agent"
  • Surface crawl fields in tool schema, CLI flags, MCP markdown, Python bindings, README, and spec

Risk

  • Medium-low
  • New behavior is opt-in and capped; every discovered URL still goes through existing URL policy, DNS policy, redirect validation, body caps, and timeouts

Checklist

  • Unit tests are passed
  • Smoke tests are passed
  • Documentation is updated
  • Specs are up to date and not in conflict

@chaliy chaliy merged commit 26e2e76 into main Jul 4, 2026
11 checks passed
@chaliy chaliy deleted the codex/agent-crawl-discovery branch July 4, 2026 23:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant