Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Highlights

- Hardened redirect handling to revalidate every hop against FetchKit's SSRF policy
- Hardened redirect handling to revalidate every hop against Fetchkit's SSRF policy
- Tightened allow/block prefix matching to use parsed URL components instead of raw string prefixes
- Added FileSaver trait for saving fetched content to files
- Mitigated 6 open threats from threat model
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ AI-friendly web content fetching tool designed for LLM consumption. Rust library

## Built-in Fetchers

FetchKit routes each request through an ordered fetcher registry. Specialized
Fetchkit routes each request through an ordered fetcher registry. Specialized
fetchers match first; the default fetcher handles everything else.

- `GitHubCodeFetcher` - GitHub source file URLs (`/blob/...`)
Expand Down Expand Up @@ -211,14 +211,14 @@ pip install fetchkit
```

```python
from fetchkit_py import fetch, FetchRequest, FetchKitTool
from fetchkit_py import fetch, FetchRequest, FetchkitTool

# Simple fetch
response = fetch("https://example.com", as_markdown=True)
print(response.content)

# With configuration
tool = FetchKitTool(
tool = FetchkitTool(
enable_markdown=True,
user_agent="MyBot/1.0",
allow_prefixes=["https://docs.example.com"]
Expand Down Expand Up @@ -282,7 +282,7 @@ Errors are returned in the `error` field:

## Security

FetchKit blocks connections to private/reserved IP ranges by default, preventing SSRF attacks when used in server-side or AI agent contexts.
Fetchkit blocks connections to private/reserved IP ranges by default, preventing SSRF attacks when used in server-side or AI agent contexts.

**Blocked by default:** loopback, private networks (10.x, 172.16-31.x, 192.168.x), link-local (169.254.x including cloud metadata), IPv6 equivalents, multicast, and other reserved ranges.

Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ edition.workspace = true
license.workspace = true
authors.workspace = true
repository.workspace = true
description = "Command line interface for FetchKit web content fetching tool"
description = "Command line interface for Fetchkit web content fetching tool"
keywords.workspace = true
categories.workspace = true
readme = "../../README.md"
Expand Down
4 changes: 2 additions & 2 deletions crates/fetchkit-cli/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! FetchKit CLI - Command-line interface for fetching web content
//! Fetchkit CLI - Command-line interface for fetching web content
//!
//! Provides the `fetchkit` binary with subcommands for fetching URLs
//! and running an MCP server.
Expand Down Expand Up @@ -27,7 +27,7 @@ enum OutputFormat {
Json,
}

/// FetchKit - AI-friendly web content fetching tool
/// Fetchkit - AI-friendly web content fetching tool
#[derive(Parser, Debug)]
#[command(name = "fetchkit")]
#[command(author, version, about, long_about = None)]
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit-cli/tests/cli_integration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ fn test_help_flag() {

let stdout = String::from_utf8_lossy(&output.stdout);
assert!(output.status.success());
assert!(stdout.contains("fetchkit") || stdout.contains("FetchKit"));
assert!(stdout.contains("fetchkit") || stdout.contains("Fetchkit"));
assert!(stdout.contains("fetch") || stdout.contains("mcp"));
}

Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit-python/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ edition.workspace = true
license.workspace = true
authors.workspace = true
repository.workspace = true
description = "Python bindings for the FetchKit library"
description = "Python bindings for the Fetchkit library"
publish = false

[lib]
Expand Down
82 changes: 71 additions & 11 deletions crates/fetchkit-python/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,19 +1,19 @@
//! Python bindings for FetchKit
//! Python bindings for Fetchkit
//!
//! Exposes the FetchKit tool contract to Python via PyO3.
//! Exposes the Fetchkit tool contract to Python via PyO3.
//!
//! # Python Usage
//!
//! ```python
//! from fetchkit_py import FetchKitTool, FetchRequest
//! from fetchkit_py import FetchkitTool, FetchRequest
//!
//! tool = FetchKitTool()
//! tool = FetchkitTool()
//! response = tool.fetch("https://example.com", as_markdown=True)
//! print(response.content)
//! ```

use fetchkit::{FetchError, FetchRequest, FetchResponse, HttpMethod, Tool, ToolBuilder};
use pyo3::exceptions::PyValueError;
use pyo3::exceptions::{PyDeprecationWarning, PyValueError};
use pyo3::prelude::*;

/// Convert FetchError to PyErr
Expand Down Expand Up @@ -190,15 +190,15 @@ impl PyFetchResponse {
}
}

/// Python wrapper for FetchKit Tool
#[pyclass(name = "FetchKitTool")]
pub struct PyFetchKitTool {
/// Python wrapper for Fetchkit Tool
#[pyclass(name = "FetchkitTool")]
pub struct PyFetchkitTool {
inner: Tool,
runtime: tokio::runtime::Runtime,
}

#[pymethods]
impl PyFetchKitTool {
impl PyFetchkitTool {
/// Create a new tool with default options
#[new]
#[allow(clippy::too_many_arguments)]
Expand Down Expand Up @@ -350,6 +350,65 @@ impl PyFetchKitTool {
}
}

#[deprecated(note = "Use PyFetchkitTool / Python FetchkitTool; FetchKitTool is deprecated.")]
pub type PyFetchKitTool = PyFetchkitTool;

/// Deprecated constructor shim for the old Python class spelling.
#[pyfunction(name = "FetchKitTool")]
#[allow(clippy::too_many_arguments)]
#[pyo3(signature = (
enable_markdown=true,
enable_text=true,
user_agent=None,
allow_prefixes=None,
block_prefixes=None,
max_body_size=None,
block_private_ips=true,
respect_proxy_env=false,
allowed_ports=None,
blocked_hosts=None,
same_host_redirects_only=None,
hardened=false
))]
fn deprecated_fetch_kit_tool(
py: Python<'_>,
enable_markdown: bool,
enable_text: bool,
user_agent: Option<String>,
allow_prefixes: Option<Vec<String>>,
block_prefixes: Option<Vec<String>>,
max_body_size: Option<usize>,
block_private_ips: bool,
respect_proxy_env: bool,
allowed_ports: Option<Vec<u16>>,
blocked_hosts: Option<Vec<String>>,
same_host_redirects_only: Option<bool>,
hardened: bool,
) -> PyResult<PyFetchkitTool> {
let warning = py.get_type::<PyDeprecationWarning>();
PyErr::warn(
py,
&warning,
c"FetchKitTool is deprecated; use FetchkitTool instead.",
1,
)?;

PyFetchkitTool::new(
enable_markdown,
enable_text,
user_agent,
allow_prefixes,
block_prefixes,
max_body_size,
block_private_ips,
respect_proxy_env,
allowed_ports,
blocked_hosts,
same_host_redirects_only,
hardened,
)
}

/// Fetch a URL using default options (convenience function)
#[pyfunction]
#[pyo3(signature = (url, method=None, as_markdown=None, as_text=None, content_focus=None, crawl=None, max_pages=None))]
Expand All @@ -363,7 +422,7 @@ fn fetch(
crawl: Option<bool>,
max_pages: Option<usize>,
) -> PyResult<PyFetchResponse> {
let tool = PyFetchKitTool::new(
let tool = PyFetchkitTool::new(
true, true, None, None, None, None, true, false, None, None, None, false,
)?;
tool.fetch(
Expand All @@ -382,7 +441,8 @@ fn fetch(
fn fetchkit_py(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_class::<PyFetchRequest>()?;
m.add_class::<PyFetchResponse>()?;
m.add_class::<PyFetchKitTool>()?;
m.add_class::<PyFetchkitTool>()?;
m.add_function(wrap_pyfunction!(deprecated_fetch_kit_tool, m)?)?;
m.add_function(wrap_pyfunction!(fetch, m)?)?;
Ok(())
}
4 changes: 2 additions & 2 deletions crates/fetchkit/examples/fetch_urls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
//! Run with: cargo run -p fetchkit --example fetch_urls
//!
//! Demonstrates the library API by fetching real URLs and showing
//! how FetchKit handles different content types (HTML, JSON, plain text).
//! how Fetchkit handles different content types (HTML, JSON, plain text).

use fetchkit::{FetchRequest, Tool};

#[tokio::main]
async fn main() {
println!("FetchKit URL Examples");
println!("Fetchkit URL Examples");
println!("=====================\n");

let tool = Tool::builder().enable_markdown(true).build();
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/examples/save_to_file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ use fetchkit::{FetchRequest, LocalFileSaver, Tool};

#[tokio::main]
async fn main() {
println!("FetchKit save_to_file Example");
println!("Fetchkit save_to_file Example");
println!("==============================\n");

let dir = tempfile::tempdir().expect("Failed to create temp dir");
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/src/client.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! HTTP client for FetchKit
//! HTTP client for Fetchkit
//!
//! This module provides the main entry points for fetching URLs.
//! The actual fetch logic is implemented by fetchers in the [`fetchers`](crate::fetchers) module.
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/src/error.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Error types for FetchKit
//! Error types for Fetchkit

use thiserror::Error;

Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/src/file_saver.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! File saving abstractions for FetchKit
//! File saving abstractions for Fetchkit
//!
//! Consumers implement [`FileSaver`] to control where fetched bytes land:
//! - CLI: writes to real filesystem ([`LocalFileSaver`])
Expand Down
6 changes: 3 additions & 3 deletions crates/fetchkit/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! FetchKit - AI-friendly web content fetching library
//! Fetchkit - AI-friendly web content fetching library
//!
//! This crate provides a reusable library API for fetching web content,
//! with optional HTML to markdown/text conversion optimized for LLM consumption.
Expand Down Expand Up @@ -53,7 +53,7 @@
//!
//! # Fetcher System
//!
//! FetchKit uses a pluggable fetcher system where specialized fetchers
//! Fetchkit uses a pluggable fetcher system where specialized fetchers
//! handle specific URL patterns. The [`FetcherRegistry`] dispatches
//! requests to the appropriate fetcher based on URL matching.
//!
Expand Down Expand Up @@ -116,7 +116,7 @@ pub use types::{
pub use bot_auth::{BotAuthConfig, BotAuthError};

/// Default User-Agent string
pub const DEFAULT_USER_AGENT: &str = "Everruns FetchKit/1.0";
pub const DEFAULT_USER_AGENT: &str = "Everruns Fetchkit/1.0";

/// Backward-compatible full description string with file-saving enabled.
pub const TOOL_DESCRIPTION: &str =
Expand Down
8 changes: 4 additions & 4 deletions crates/fetchkit/src/tool.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Tool builder and toolkit-library contract for FetchKit.
//! Tool builder and toolkit-library contract for Fetchkit.
//
// DECISION: keep the legacy typed `execute`/`llmtxt` surface as wrappers around the
// toolkit-library contract so existing fetchkit callers can migrate incrementally.
Expand Down Expand Up @@ -100,7 +100,7 @@ pub struct ToolOutput {
pub metadata: ToolOutputMetadata,
}

/// Builder for configuring the FetchKit tool
/// Builder for configuring the Fetchkit tool
///
/// # Examples
///
Expand Down Expand Up @@ -304,7 +304,7 @@ impl ToolBuilder {

/// Control private/reserved IP range blocking (SSRF prevention)
///
/// Enabled by default. When enabled, FetchKit resolves hostnames to IP
/// Enabled by default. When enabled, Fetchkit resolves hostnames to IP
/// addresses before connecting and validates that the resolved IP is not
/// in a private or reserved range. DNS pinning prevents rebinding attacks.
///
Expand Down Expand Up @@ -442,7 +442,7 @@ impl ToolBuilder {
}
}

/// Configured FetchKit tool
/// Configured Fetchkit tool
///
/// Created via [`ToolBuilder`]. Provides methods for executing fetch requests,
/// retrieving schemas, and accessing tool metadata.
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/src/types.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Core types for FetchKit
//! Core types for Fetchkit

use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/tests/integration.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! Integration tests for FetchKit using wiremock
//! Integration tests for Fetchkit using wiremock

use fetchkit::{
fetch_with_options, DnsPolicy, FetchError, FetchOptions, FetchRequest, FetcherRegistry,
Expand Down
2 changes: 1 addition & 1 deletion crates/fetchkit/tests/ssrf_security.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
//! SSRF security tests for FetchKit
//! SSRF security tests for Fetchkit
//!
//! Tests that validate the resolve-then-check DNS policy prevents
//! server-side request forgery attacks. These tests verify the threat
Expand Down
6 changes: 3 additions & 3 deletions docs/security.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Security Notes

FetchKit is intended to run in agent, server, and cluster environments where URL input may be
Fetchkit is intended to run in agent, server, and cluster environments where URL input may be
user-controlled.

## Safe Defaults
Expand All @@ -18,7 +18,7 @@ For shared VMs, containers, or clusters:
- Keep private-IP blocking enabled.
- Keep proxy inheritance disabled unless outbound traffic must traverse a trusted proxy.
- Use allow-lists where possible instead of relying only on block-lists.
- Apply caller-side rate limits and concurrency limits around FetchKit.
- Apply caller-side rate limits and concurrency limits around Fetchkit.

If you need different limits, configure them through `ToolBuilder`:

Expand All @@ -35,7 +35,7 @@ See [`specs/threat-model.md`](../specs/threat-model.md) for the full threat inve

## Web Bot Authentication

FetchKit optionally supports the [Web Bot Authentication Architecture](https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture),
Fetchkit optionally supports the [Web Bot Authentication Architecture](https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-auth-architecture),
which signs outgoing requests with Ed25519 signatures per RFC 9421. This lets
origins verify bot identity cryptographically instead of relying on User-Agent
strings.
Expand Down
10 changes: 5 additions & 5 deletions examples/langchain_summarize.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@
# ]
# ///
"""
LangChain agent example using FetchKit MCP server for web fetching.
LangChain agent example using Fetchkit MCP server for web fetching.

This example creates a LangChain agent that can fetch web content using the
FetchKit MCP tool and summarize it using an LLM.
Fetchkit MCP tool and summarize it using an LLM.

Requirements:
- OPENAI_API_KEY environment variable set
- FetchKit CLI built: cargo build -p fetchkit-cli --release
- Fetchkit CLI built: cargo build -p fetchkit-cli --release

Usage:
uv run examples/langchain_summarize.py
Expand All @@ -40,11 +40,11 @@ async def main():
# URL to summarize
url = "https://everruns.com/"

print("Creating LangChain agent with FetchKit MCP tool...")
print("Creating LangChain agent with Fetchkit MCP tool...")
print(f"Target URL: {url}")
print()

# Create MCP client connected to FetchKit server
# Create MCP client connected to Fetchkit server
mcp_client = MultiServerMCPClient(
{
"fetchkit": {
Expand Down
Loading
Loading