Last Updated: October 2025
- What is MCP?
- MCP in ScrapeGraph
- Communication Protocol
- Tool Schema
- Error Handling
- Client Integration
The Model Context Protocol (MCP) is an open standard that defines how AI assistants (like Claude, Cursor, etc.) can interact with external tools and services in a consistent, structured way.
Official Documentation: https://modelcontextprotocol.io/
1. Server
- Exposes tools that AI assistants can use
- Implements MCP protocol
- Runs as a separate process
- Example: This ScrapeGraph MCP server
2. Client
- AI assistant that uses the tools
- Sends tool invocation requests
- Receives tool results
- Examples: Claude Desktop, Cursor, other AI assistants
3. Transport
- Communication layer between client and server
- Types: stdio (standard input/output), HTTP, SSE
- This server uses: stdio
4. Tools
- Functions exposed by the server
- Have typed parameters and return values
- Automatically discovered by AI assistants
- Examples:
scrape(),extract()
5. Resources
- Data exposed by the server (optional)
- Not used in this implementation
6. Prompts
- Pre-defined prompts exposed by the server (optional)
- Not used in this implementation
┌─────────────────────────────────┐
│ AI Assistant (Client) │
│ - Claude Desktop │
│ - Cursor │
│ - Other MCP-compatible AIs │
└────────────┬────────────────────┘
│ MCP Protocol (JSON-RPC over stdio)
│ - Tool discovery
│ - Tool invocation
│ - Result streaming
▼
┌─────────────────────────────────┐
│ FastMCP Server │
│ - Tool registry │
│ - Parameter validation │
│ - Serialization/ │
│ deserialization │
└────────────┬────────────────────┘
│ Python function calls
▼
┌─────────────────────────────────┐
│ ScapeGraphClient │
│ - HTTP client (httpx) │
│ - API authentication │
│ - Error handling │
└────────────┬────────────────────┘
│ HTTPS API requests
▼
┌───────────────────────────────────┐
│ ScrapeGraphAI API │
│ v2-api.scrapegraphai.com/api │
└───────────────────────────────────┘
This server uses FastMCP, a lightweight Python framework for building MCP servers:
from mcp.server.fastmcp import FastMCP
# Create MCP server
mcp = FastMCP("ScapeGraph API MCP Server")
# Define tools with decorators
@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
"""Convert a webpage to markdown."""
# Implementation...
return {"result": "..."}
# Run the server
mcp.run(transport="stdio")FastMCP Features:
- Automatic tool discovery from decorated functions
- Type hint → MCP schema generation
- Request/response serialization
- Error handling
- Stdio transport out-of-the-box
Standard Input/Output (stdio) is used for client-server communication:
- stdin (→ Server): Client sends JSON-RPC requests
- stdout (← Client): Server sends JSON-RPC responses
- stderr (← Client): Server logs (not part of MCP protocol)
Example Flow:
Client → Server (stdin):
{"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "scrape", "arguments": {"website_url": "https://example.com"}}, "id": 1}
Server → Client (stdout):
{"jsonrpc": "2.0", "result": {"result": "# Example\n\nMarkdown content..."}, "id": 1}
MCP uses JSON-RPC 2.0 for message structure:
Request:
{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "extract",
"arguments": {
"user_prompt": "Extract product names",
"website_url": "https://example.com"
}
},
"id": 1
}Response:
{
"jsonrpc": "2.0",
"result": {
"result": {
"products": ["Product A", "Product B"]
}
},
"id": 1
}Error Response:
{
"jsonrpc": "2.0",
"error": {
"code": -32603,
"message": "Internal error",
"data": "Error 401: Unauthorized"
},
"id": 1
}Tool Discovery:
{"jsonrpc": "2.0", "method": "tools/list", "id": 1}
Response:
{
"jsonrpc": "2.0",
"result": {
"tools": [
{
"name": "scrape",
"description": "Convert a webpage into clean, formatted markdown.",
"inputSchema": {
"type": "object",
"properties": {
"website_url": {"type": "string"}
},
"required": ["website_url"]
}
},
// ... other tools
]
},
"id": 1
}Tool Invocation:
{"jsonrpc": "2.0", "method": "tools/call", "params": {...}, "id": 1}Initialize:
{"jsonrpc": "2.0", "method": "initialize", "params": {...}, "id": 1}Each tool exposed by the server has a schema that defines its parameters and return type.
Python Definition:
@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
"""
Convert a webpage into clean, formatted markdown.
Args:
website_url: URL of the webpage to convert
Returns:
Dictionary containing the markdown result
"""
# Implementation...Generated MCP Schema:
{
"name": "scrape",
"description": "Convert a webpage into clean, formatted markdown.",
"inputSchema": {
"type": "object",
"properties": {
"website_url": {
"type": "string",
"description": "URL of the webpage to convert"
}
},
"required": ["website_url"]
}
}Type Mapping:
- Python
str→ JSON Schema"type": "string" - Python
int→ JSON Schema"type": "integer" - Python
bool→ JSON Schema"type": "boolean" - Python
Dict[str, Any]→ JSON Schema"type": "object" - Python
Optional[str]→ JSON Schema"type": ["string", "null"]
Python Definition:
@mcp.tool()
def extract(
user_prompt: str,
website_url: str,
number_of_scrolls: int = None,
markdown_only: bool = None
) -> Dict[str, Any]:
"""Extract structured data from a webpage using AI."""
# Implementation...Generated MCP Schema:
{
"name": "extract",
"description": "Extract structured data from a webpage using AI.",
"inputSchema": {
"type": "object",
"properties": {
"user_prompt": {"type": "string"},
"website_url": {"type": "string"},
"number_of_scrolls": {"type": ["integer", "null"]},
"markdown_only": {"type": ["boolean", "null"]}
},
"required": ["user_prompt", "website_url"]
}
}The server implements graceful error handling to prevent crashes and provide meaningful feedback to AI assistants.
Approach:
- No exceptions to client - All errors caught in tool functions
- Error dictionaries - Return
{"error": "message"}instead of raising - Detailed messages - Include HTTP status codes and API error messages
@mcp.tool()
def tool_name(param: str) -> Dict[str, Any]:
"""Tool description."""
if scrapegraph_client is None:
return {"error": "ScapeGraph client not initialized. Please provide an API key."}
try:
return scrapegraph_client.method(param)
except Exception as e:
return {"error": str(e)}Why this approach?
- Prevents server crashes
- Allows AI to handle errors gracefully
- Enables retry logic
- Provides context for user troubleshooting
1. Client Not Initialized:
{
"error": "ScapeGraph client not initialized. Please provide an API key."
}Cause: Missing SGAI_API_KEY environment variable
2. API Errors:
{
"error": "Error 401: Unauthorized"
}Cause: Invalid API key
{
"error": "Error 402: Payment Required - Insufficient credits"
}Cause: Not enough credits
{
"error": "Error 404: Not Found"
}Cause: Invalid URL or API endpoint
3. Network Errors:
{
"error": "httpx.ConnectTimeout: Connection timed out"
}Cause: Network issues or slow website
4. Validation Errors (SmartCrawler):
{
"error": "prompt is required when extraction_mode is 'ai'"
}Cause: Missing required parameter for AI extraction mode
When a tool returns an error, AI assistants typically:
- Parse the error message
- Determine if retryable (network error) or not (invalid API key)
- Inform the user with actionable guidance
- Suggest fixes (e.g., "Please add credits to your account")
Example AI Response:
User: "Convert https://example.com to markdown"
Tool result: {"error": "Error 402: Payment Required - Insufficient credits"}
AI: "I wasn't able to convert the webpage because your ScrapeGraphAI account has insufficient credits. Please add credits at https://dashboard.scrapegraphai.com and try again."
Configuration File Location:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%/Claude/claude_desktop_config.json
Configuration:
{
"mcpServers": {
"@ScrapeGraphAI-scrapegraph-mcp": {
"command": "npx",
"args": [
"-y",
"@smithery/cli@latest",
"run",
"@ScrapeGraphAI/scrapegraph-mcp",
"--config",
"{\"scrapegraphApiKey\":\"YOUR-SGAI-API-KEY\"}"
]
}
}
}How It Works:
- Claude Desktop reads the config file on startup
- Starts the MCP server as a child process using the specified command
- Establishes stdio communication
- Discovers available tools via
tools/list - User asks a question that requires web scraping
- Claude calls the appropriate tool via
tools/call - Server executes the tool and returns results
- Claude incorporates results into its response
Example Interaction:
User: "What are the main features of ScrapeGraphAI?"
Claude (internal):
1. Determines that scrape tool could help
2. Calls: scrape("https://scrapegraphai.com")
3. Receives markdown content
4. Analyzes content
5. Responds to user
Claude (to user): "Based on the ScrapeGraphAI website, the main features are:
- AI-powered web scraping
- Multiple scraping modes (SmartScraper, SearchScraper, etc.)
- ...
"
Setup:
- Open Cursor settings
- Navigate to "MCP Servers" section
- Click "Add MCP Server"
- Select or configure ScrapeGraphAI MCP
- Enter API key
Usage:
- Cursor's AI chat can automatically invoke MCP tools
- Similar interaction pattern to Claude Desktop
- Tool calls visible in chat interface (optional)
To integrate with a custom MCP client:
1. Install the MCP SDK:
pip install mcp2. Create a client:
import asyncio
from mcp.client import ClientSession
from mcp.client.stdio import stdio_client
async def main():
# Start the server process
server_params = StdioServerParameters(
command="scrapegraph-mcp",
env={"SGAI_API_KEY": "your-api-key"}
)
async with stdio_client(server_params) as (read, write):
async with ClientSession(read, write) as session:
# Initialize
await session.initialize()
# List tools
tools = await session.list_tools()
print(f"Available tools: {[t.name for t in tools]}")
# Call a tool
result = await session.call_tool(
"scrape",
arguments={"website_url": "https://example.com"}
)
print(f"Result: {result}")
asyncio.run(main())3. Handle tool results:
if "error" in result:
print(f"Tool error: {result['error']}")
else:
print(f"Tool success: {result['result']}")Currently, the server does not implement tool versioning. All tools are v1 implicitly.
Future Consideration:
- Add version to tool names:
extract_v2() - Maintain backward compatibility with deprecated tools
- Use MCP metadata for version info
MCP supports streaming results for long-running operations. This could be useful for SmartCrawler:
Current Approach (polling):
- Call
crawl_start()→ getrequest_id - Repeatedly call
crawl_get_status(request_id)until complete
Potential Streaming Approach:
- Call
crawl_start()→ server keeps connection open - Server streams progress updates:
{"status": "processing", "pages": 10} - Server sends final result:
{"status": "completed", "results": [...]}
Not currently implemented due to FastMCP limitations.
Current Approach:
- API key passed via environment variable or config parameter
- Single API key for entire server instance
- No per-tool authentication
Future Consideration:
- Support multiple API keys (user-specific)
- OAuth integration
- JWT tokens
Current State:
- No rate limiting in the MCP server
- Rate limiting handled by ScrapeGraphAI API
- Server is a simple pass-through
Future Consideration:
- Client-side rate limiting to prevent API quota exhaustion
- Configurable request throttling
- Request queuing
The MCP Inspector is a tool for testing MCP servers:
npx @modelcontextprotocol/inspector scrapegraph-mcpFeatures:
- Interactive tool discovery
- Manual tool invocation
- Request/response inspection
- Error debugging
FastMCP Logging:
# Add logging to server.py
import logging
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)
@mcp.tool()
def scrape(website_url: str) -> Dict[str, Any]:
logger.info(f"scrape called with URL: {website_url}")
# ...View Logs:
- Logs printed to stderr (not part of MCP protocol)
- Visible in Claude Desktop logs:
~/Library/Logs/Claude/(macOS) - Use MCP Inspector to see real-time logs
Issue: Tools not appearing in Claude
- Check: Is the server running? Look in Claude logs
- Check: Is the config file correct? Verify JSON syntax
- Check: Does
tools/listreturn the tools? Use MCP Inspector
Issue: Tool calls failing
- Check: Is the API key valid? Test with curl
- Check: Are parameters correct? Review tool schema
- Check: Network connectivity? Check firewall/proxy
Issue: Server crashes
- Check: Python version (≥3.10)?
- Check: Dependencies installed?
pip list - Check: Error in logs? Check stderr output
1. Clear Descriptions
- Write docstrings that explain what the tool does
- Include parameter descriptions
- Specify expected input/output formats
2. Type Hints
- Always use type hints for parameters and return values
- FastMCP generates schemas from type hints
- Helps AI understand tool contracts
3. Error Messages
- Provide actionable error messages
- Include HTTP status codes
- Suggest fixes when possible
4. Optional Parameters
- Use
= Nonefor optional parameters - Document default behavior
- Don't require unnecessary inputs
1. Statelessness
- Each tool invocation should be independent
- Don't rely on shared state between calls
- Use API key from config, not global variable
2. Idempotency
- Same inputs should produce same outputs (when possible)
- Helps with retries and debugging
- Cache results when appropriate
3. Performance
- Keep tool invocations fast (<60s)
- Use async operations for I/O (future improvement)
- Consider timeouts for slow operations
4. Security
- Never log API keys
- Validate all inputs
- Use HTTPS for API calls
- Rotate API keys regularly
- MCP Specification: https://modelcontextprotocol.io/
- MCP Python SDK: https://github.com/modelcontextprotocol/python-sdk
- FastMCP: https://github.com/jlowin/fastmcp
- JSON-RPC 2.0: https://www.jsonrpc.org/specification
- ScrapeGraphAI API: https://api.scrapegraphai.com/docs
Made with ❤️ by ScrapeGraphAI Team