Model Context Protocol (MCP) Integration

Last Updated: October 2025

What is MCP?
MCP in ScrapeGraph
Communication Protocol
Tool Schema
Error Handling
Client Integration

What is MCP?

The Model Context Protocol (MCP) is an open standard that defines how AI assistants (like Claude, Cursor, etc.) can interact with external tools and services in a consistent, structured way.

Official Documentation: https://modelcontextprotocol.io/

Key Concepts

1. Server

Exposes tools that AI assistants can use
Implements MCP protocol
Runs as a separate process
Example: This ScrapeGraph MCP server

2. Client

AI assistant that uses the tools
Sends tool invocation requests
Receives tool results
Examples: Claude Desktop, Cursor, other AI assistants

3. Transport

Communication layer between client and server
Types: stdio (standard input/output), HTTP, SSE
This server uses: stdio

4. Tools

Functions exposed by the server
Have typed parameters and return values
Automatically discovered by AI assistants
Examples: scrape(), extract()

5. Resources

Data exposed by the server (optional)
Not used in this implementation

6. Prompts

Pre-defined prompts exposed by the server (optional)
Not used in this implementation

MCP in ScrapeGraph

Architecture Overview

┌─────────────────────────────────┐
│   AI Assistant (Client)         │
│   - Claude Desktop              │
│   - Cursor                      │
│   - Other MCP-compatible AIs    │
└────────────┬────────────────────┘
             │ MCP Protocol (JSON-RPC over stdio)
             │ - Tool discovery
             │ - Tool invocation
             │ - Result streaming
             ▼
┌─────────────────────────────────┐
│   FastMCP Server                │
│   - Tool registry               │
│   - Parameter validation        │
│   - Serialization/              │
│     deserialization             │
└────────────┬────────────────────┘
             │ Python function calls
             ▼
┌─────────────────────────────────┐
│   ScapeGraphClient              │
│   - HTTP client (httpx)         │
│   - API authentication          │
│   - Error handling              │
└────────────┬────────────────────┘
             │ HTTPS API requests
             ▼
┌───────────────────────────────────┐
│   ScrapeGraphAI API               │
│   v2-api.scrapegraphai.com/api    │
└───────────────────────────────────┘

FastMCP Framework

This server uses FastMCP, a lightweight Python framework for building MCP servers:

from mcp.server.fastmcp import FastMCP

# Create MCP server
mcp = FastMCP("ScapeGraph API MCP Server")

# Define tools with decorators
@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
    """Convert a webpage to markdown."""
    # Implementation...
    return {"result": "..."}

# Run the server
mcp.run(transport="stdio")

FastMCP Features:

Automatic tool discovery from decorated functions
Type hint → MCP schema generation
Request/response serialization
Error handling
Stdio transport out-of-the-box

Communication Protocol

Transport: stdio

Standard Input/Output (stdio) is used for client-server communication:

stdin (→ Server): Client sends JSON-RPC requests
stdout (← Client): Server sends JSON-RPC responses
stderr (← Client): Server logs (not part of MCP protocol)

Example Flow:

Client → Server (stdin):
{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "scrape", "arguments": {"website_url": "https://example.com"}}, "id": 1}

Server → Client (stdout):
{"jsonrpc": "2.0", "result": {"result": "# Example\n\nMarkdown content..."}, "id": 1}

JSON-RPC 2.0

MCP uses JSON-RPC 2.0 for message structure:

Request:

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "extract",
    "arguments": {
      "user_prompt": "Extract product names",
      "website_url": "https://example.com"
    }
  },
  "id": 1
}

Response:

{
  "jsonrpc": "2.0",
  "result": {
    "result": {
      "products": ["Product A", "Product B"]
    }
  },
  "id": 1
}

Error Response:

{
  "jsonrpc": "2.0",
  "error": {
    "code": -32603,
    "message": "Internal error",
    "data": "Error 401: Unauthorized"
  },
  "id": 1
}

MCP Methods

Tool Discovery:

{"jsonrpc": "2.0", "method": "tools/list", "id": 1}

Response:
{
  "jsonrpc": "2.0",
  "result": {
    "tools": [
      {
        "name": "scrape",
        "description": "Convert a webpage into clean, formatted markdown.",
        "inputSchema": {
          "type": "object",
          "properties": {
            "website_url": {"type": "string"}
          },
          "required": ["website_url"]
        }
      },
      // ... other tools
    ]
  },
  "id": 1
}

Tool Invocation:

{"jsonrpc": "2.0", "method": "tools/call", "params": {...}, "id": 1}

Initialize:

{"jsonrpc": "2.0", "method": "initialize", "params": {...}, "id": 1}

Tool Schema

Each tool exposed by the server has a schema that defines its parameters and return type.

Example: `scrape` Tool

Python Definition:

@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
    """
    Convert a webpage into clean, formatted markdown.

    Args:
        website_url: URL of the webpage to convert

    Returns:
        Dictionary containing the markdown result
    """
    # Implementation...

Generated MCP Schema:

{
  "name": "scrape",
  "description": "Convert a webpage into clean, formatted markdown.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "website_url": {
        "type": "string",
        "description": "URL of the webpage to convert"
      }
    },
    "required": ["website_url"]
  }
}

Type Mapping:

Python str → JSON Schema "type": "string"
Python int → JSON Schema "type": "integer"
Python bool → JSON Schema "type": "boolean"
Python Dict[str, Any] → JSON Schema "type": "object"
Python Optional[str] → JSON Schema "type": ["string", "null"]

Example: `extract` Tool (with optional parameters)

Python Definition:

@mcp.tool()
def extract(
    user_prompt: str,
    website_url: str,
    number_of_scrolls: int = None,
    markdown_only: bool = None
) -> Dict[str, Any]:
    """Extract structured data from a webpage using AI."""
    # Implementation...

Generated MCP Schema:

{
  "name": "extract",
  "description": "Extract structured data from a webpage using AI.",
  "inputSchema": {
    "type": "object",
    "properties": {
      "user_prompt": {"type": "string"},
      "website_url": {"type": "string"},
      "number_of_scrolls": {"type": ["integer", "null"]},
      "markdown_only": {"type": ["boolean", "null"]}
    },
    "required": ["user_prompt", "website_url"]
  }
}

Error Handling

Error Strategy

The server implements graceful error handling to prevent crashes and provide meaningful feedback to AI assistants.

Approach:

No exceptions to client - All errors caught in tool functions
Error dictionaries - Return {"error": "message"} instead of raising
Detailed messages - Include HTTP status codes and API error messages

Error Handling Pattern

@mcp.tool()
def tool_name(param: str) -> Dict[str, Any]:
    """Tool description."""
    if scrapegraph_client is None:
        return {"error": "ScapeGraph client not initialized. Please provide an API key."}

    try:
        return scrapegraph_client.method(param)
    except Exception as e:
        return {"error": str(e)}

Why this approach?

Prevents server crashes
Allows AI to handle errors gracefully
Enables retry logic
Provides context for user troubleshooting

Error Types

1. Client Not Initialized:

{
  "error": "ScapeGraph client not initialized. Please provide an API key."
}

Cause: Missing SGAI_API_KEY environment variable

2. API Errors:

{
  "error": "Error 401: Unauthorized"
}

Cause: Invalid API key

{
  "error": "Error 402: Payment Required - Insufficient credits"
}

Cause: Not enough credits

{
  "error": "Error 404: Not Found"
}

Cause: Invalid URL or API endpoint

3. Network Errors:

{
  "error": "httpx.ConnectTimeout: Connection timed out"
}

Cause: Network issues or slow website

4. Validation Errors (SmartCrawler):

{
  "error": "prompt is required when extraction_mode is 'ai'"
}

Cause: Missing required parameter for AI extraction mode

AI Assistant Error Handling

When a tool returns an error, AI assistants typically:

Parse the error message
Determine if retryable (network error) or not (invalid API key)
Inform the user with actionable guidance
Suggest fixes (e.g., "Please add credits to your account")

Example AI Response:

User: "Convert https://example.com to markdown"

Tool result: {"error": "Error 402: Payment Required - Insufficient credits"}

AI: "I wasn't able to convert the webpage because your ScrapeGraphAI account has insufficient credits. Please add credits at https://dashboard.scrapegraphai.com and try again."

Client Integration

Claude Desktop Integration

Configuration File Location:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%/Claude/claude_desktop_config.json

Configuration:

{
  "mcpServers": {
    "@ScrapeGraphAI-scrapegraph-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "@smithery/cli@latest",
        "run",
        "@ScrapeGraphAI/scrapegraph-mcp",
        "--config",
        "{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}"
      ]
    }
  }
}

How It Works:

Claude Desktop reads the config file on startup
Starts the MCP server as a child process using the specified command
Establishes stdio communication
Discovers available tools via tools/list
User asks a question that requires web scraping
Claude calls the appropriate tool via tools/call
Server executes the tool and returns results
Claude incorporates results into its response

Example Interaction:

User: "What are the main features of ScrapeGraphAI?"

Claude (internal):
1. Determines that scrape tool could help
2. Calls: scrape("https://scrapegraphai.com")
3. Receives markdown content
4. Analyzes content
5. Responds to user

Claude (to user): "Based on the ScrapeGraphAI website, the main features are:
- AI-powered web scraping
- Multiple scraping modes (SmartScraper, SearchScraper, etc.)
- ...
"

Cursor Integration

Setup:

Open Cursor settings
Navigate to "MCP Servers" section
Click "Add MCP Server"
Select or configure ScrapeGraphAI MCP
Enter API key

Usage:

Cursor's AI chat can automatically invoke MCP tools
Similar interaction pattern to Claude Desktop
Tool calls visible in chat interface (optional)

Custom Client Integration

To integrate with a custom MCP client:

1. Install the MCP SDK:

pip install mcp

2. Create a client:

import asyncio
from mcp.client import ClientSession
from mcp.client.stdio import stdio_client

async def main():
    # Start the server process
    server_params = StdioServerParameters(
        command="scrapegraph-mcp",
        env={"SGAI_API_KEY": "your-api-key"}
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize
            await session.initialize()

            # List tools
            tools = await session.list_tools()
            print(f"Available tools: {[t.name for t in tools]}")

            # Call a tool
            result = await session.call_tool(
                "scrape",
                arguments={"website_url": "https://example.com"}
            )
            print(f"Result: {result}")

asyncio.run(main())

3. Handle tool results:

if "error" in result:
    print(f"Tool error: {result['error']}")
else:
    print(f"Tool success: {result['result']}")

Advanced Topics

Tool Versioning

Currently, the server does not implement tool versioning. All tools are v1 implicitly.

Future Consideration:

Add version to tool names: extract_v2()
Maintain backward compatibility with deprecated tools
Use MCP metadata for version info

Streaming Results

MCP supports streaming results for long-running operations. This could be useful for SmartCrawler:

Current Approach (polling):

Call crawl_start() → get request_id
Repeatedly call crawl_get_status(request_id) until complete

Potential Streaming Approach:

Call crawl_start() → server keeps connection open
Server streams progress updates: {"status": "processing", "pages": 10}
Server sends final result: {"status": "completed", "results": [...]}

Not currently implemented due to FastMCP limitations.

Authentication

Current Approach:

API key passed via environment variable or config parameter
Single API key for entire server instance
No per-tool authentication

Future Consideration:

Support multiple API keys (user-specific)
OAuth integration
JWT tokens

Rate Limiting

Current State:

No rate limiting in the MCP server
Rate limiting handled by ScrapeGraphAI API
Server is a simple pass-through

Future Consideration:

Client-side rate limiting to prevent API quota exhaustion
Configurable request throttling
Request queuing

Debugging MCP

MCP Inspector

The MCP Inspector is a tool for testing MCP servers:

npx @modelcontextprotocol/inspector scrapegraph-mcp

Features:

Interactive tool discovery
Manual tool invocation
Request/response inspection
Error debugging

Server Logs

FastMCP Logging:

# Add logging to server.py
import logging

logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
    logger.info(f"scrape called with URL: {website_url}")
    # ...

View Logs:

Logs printed to stderr (not part of MCP protocol)
Visible in Claude Desktop logs: ~/Library/Logs/Claude/ (macOS)
Use MCP Inspector to see real-time logs

Common Debugging Issues

Issue: Tools not appearing in Claude

Check: Is the server running? Look in Claude logs
Check: Is the config file correct? Verify JSON syntax
Check: Does tools/list return the tools? Use MCP Inspector

Issue: Tool calls failing

Check: Is the API key valid? Test with curl
Check: Are parameters correct? Review tool schema
Check: Network connectivity? Check firewall/proxy

Issue: Server crashes

Check: Python version (≥3.10)?
Check: Dependencies installed? pip list
Check: Error in logs? Check stderr output

Best Practices

Tool Design

1. Clear Descriptions

Write docstrings that explain what the tool does
Include parameter descriptions
Specify expected input/output formats

2. Type Hints

Always use type hints for parameters and return values
FastMCP generates schemas from type hints
Helps AI understand tool contracts

3. Error Messages

Provide actionable error messages
Include HTTP status codes
Suggest fixes when possible

4. Optional Parameters

Use = None for optional parameters
Document default behavior
Don't require unnecessary inputs

Server Design

1. Statelessness

Each tool invocation should be independent
Don't rely on shared state between calls
Use API key from config, not global variable

2. Idempotency

Same inputs should produce same outputs (when possible)
Helps with retries and debugging
Cache results when appropriate

3. Performance

Keep tool invocations fast (<60s)
Use async operations for I/O (future improvement)
Consider timeouts for slow operations

4. Security

Never log API keys
Validate all inputs
Use HTTPS for API calls
Rotate API keys regularly

References

MCP Specification: https://modelcontextprotocol.io/
MCP Python SDK: https://github.com/modelcontextprotocol/python-sdk
FastMCP: https://github.com/jlowin/fastmcp
JSON-RPC 2.0: https://www.jsonrpc.org/specification
ScrapeGraphAI API: https://api.scrapegraphai.com/docs

Made with ❤️ by ScrapeGraphAI Team

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Context Protocol (MCP) Integration

Table of Contents

What is MCP?

Key Concepts

MCP in ScrapeGraph

Architecture Overview

FastMCP Framework

Communication Protocol

Transport: stdio

JSON-RPC 2.0

MCP Methods

Tool Schema

Example: `scrape` Tool

Example: `extract` Tool (with optional parameters)

Error Handling

Error Strategy

Error Handling Pattern

Error Types

AI Assistant Error Handling

Client Integration

Claude Desktop Integration

Cursor Integration

Custom Client Integration

Advanced Topics

Tool Versioning

Streaming Results

Authentication

Rate Limiting

Debugging MCP

MCP Inspector

Server Logs

Common Debugging Issues

Best Practices

Tool Design

Server Design

References

FilesExpand file tree

mcp_protocol.md

Latest commit

History

mcp_protocol.md

File metadata and controls

Model Context Protocol (MCP) Integration

Table of Contents

What is MCP?

Key Concepts

MCP in ScrapeGraph

Architecture Overview

FastMCP Framework

Communication Protocol

Transport: stdio

JSON-RPC 2.0

MCP Methods

Tool Schema

Example: scrape Tool

Example: extract Tool (with optional parameters)

Error Handling

Error Strategy

Error Handling Pattern

Error Types

AI Assistant Error Handling

Client Integration

Claude Desktop Integration

Cursor Integration

Custom Client Integration

Advanced Topics

Tool Versioning

Streaming Results

Authentication

Rate Limiting

Debugging MCP

MCP Inspector

Server Logs

Common Debugging Issues

Best Practices

Tool Design

Server Design

References

Example: `scrape` Tool

Example: `extract` Tool (with optional parameters)