Skip to content

Commit 02c4c1a

Browse files
committed
docs: add sampling example and documentation
Add a complete sampling example with both server and client, plus a documentation page explaining the sampling feature. Server (examples/servers/simple-sampling): - Exposes summarize and analyze_sentiment tools that use sampling - Demonstrates ctx.session.create_message() for server-side LLM requests Client (examples/clients/simple-sampling-client): - Provides a sampling_callback to handle server LLM requests - Shows how to integrate with real LLM providers (OpenAI, Anthropic) Documentation (docs/sampling.md): - Explains the sampling flow with a sequence diagram - Documents create_message parameters - Shows client-side callback setup - Includes LLM provider integration examples (OpenAI, Anthropic) - Covers model preferences Fixes #1205
1 parent 62575ed commit 02c4c1a

File tree

12 files changed

+593
-0
lines changed

12 files changed

+593
-0
lines changed

docs/sampling.md

Lines changed: 200 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,200 @@
1+
# Sampling
2+
3+
Sampling is a powerful MCP feature that allows servers to request LLM completions from the client. Instead of the server needing its own LLM access, it can "borrow" the client's language model to generate text, analyze content, or perform any LLM task.
4+
5+
## How It Works
6+
7+
In a typical MCP interaction, the client calls tools on the server. With sampling, the flow is reversed for part of the interaction:
8+
9+
```
10+
Client Server
11+
│ │
12+
│ call_tool("summarize") │
13+
│──────────────────────────────>│
14+
│ │
15+
│ sampling/createMessage │
16+
│<──────────────────────────────│
17+
│ │
18+
│ (client calls LLM) │
19+
│ │
20+
│ CreateMessageResult │
21+
│──────────────────────────────>│
22+
│ │
23+
│ tool result │
24+
│<──────────────────────────────│
25+
```
26+
27+
1. The client calls a tool on the server.
28+
2. The server's tool handler sends a `sampling/createMessage` request back to the client.
29+
3. The client's sampling callback processes the request (typically by calling an LLM).
30+
4. The client returns the LLM response to the server.
31+
5. The server uses the response to complete the tool execution.
32+
33+
## Server Side
34+
35+
On the server side, use `ctx.session.create_message()` inside a tool handler to request a completion:
36+
37+
--8<-- "examples/snippets/servers/sampling.py"
38+
39+
The `create_message` method accepts these parameters:
40+
41+
| Parameter | Type | Description |
42+
|-----------|------|-------------|
43+
| `messages` | `list[SamplingMessage]` | The conversation messages to send |
44+
| `max_tokens` | `int` | Maximum tokens in the response |
45+
| `system_prompt` | `str \| None` | Optional system prompt |
46+
| `temperature` | `float \| None` | Sampling temperature (0.0 = deterministic) |
47+
| `stop_sequences` | `list[str] \| None` | Sequences that stop generation |
48+
| `model_preferences` | `ModelPreferences \| None` | Hints about which model to use |
49+
50+
## Client Side
51+
52+
On the client side, provide a `sampling_callback` when creating the session. This callback handles `sampling/createMessage` requests from the server:
53+
54+
```python
55+
from mcp import ClientSession, StdioServerParameters, types
56+
from mcp.client.context import ClientRequestContext
57+
from mcp.client.stdio import stdio_client
58+
59+
60+
async def handle_sampling(
61+
context: ClientRequestContext,
62+
params: types.CreateMessageRequestParams,
63+
) -> types.CreateMessageResult:
64+
# Forward the request to your LLM
65+
# ... call OpenAI, Anthropic, Azure OpenAI, etc.
66+
return types.CreateMessageResult(
67+
role="assistant",
68+
content=types.TextContent(type="text", text="LLM response here"),
69+
model="your-model-name",
70+
stop_reason="endTurn",
71+
)
72+
73+
74+
async def main():
75+
server_params = StdioServerParameters(command="your-server-command")
76+
77+
async with stdio_client(server_params) as (read, write):
78+
async with ClientSession(
79+
read,
80+
write,
81+
sampling_callback=handle_sampling,
82+
) as session:
83+
await session.initialize()
84+
85+
# Now when you call a tool that uses sampling,
86+
# your callback will be invoked automatically
87+
result = await session.call_tool("summarize", {"text": "..."})
88+
```
89+
90+
### Using the High-Level Client
91+
92+
The `Client` class also supports sampling callbacks:
93+
94+
```python
95+
from mcp import Client
96+
97+
async with Client(server, sampling_callback=handle_sampling) as client:
98+
result = await client.call_tool("summarize", {"text": "..."})
99+
```
100+
101+
## Integrating with LLM Providers
102+
103+
Here is how to connect the sampling callback to popular LLM providers:
104+
105+
### OpenAI
106+
107+
```python
108+
from openai import AsyncOpenAI
109+
110+
openai_client = AsyncOpenAI()
111+
112+
113+
async def handle_sampling(
114+
context: ClientRequestContext,
115+
params: types.CreateMessageRequestParams,
116+
) -> types.CreateMessageResult:
117+
messages = []
118+
if params.system_prompt:
119+
messages.append({"role": "system", "content": params.system_prompt})
120+
for msg in params.messages:
121+
if isinstance(msg.content, types.TextContent):
122+
messages.append({"role": msg.role, "content": msg.content.text})
123+
124+
response = await openai_client.chat.completions.create(
125+
model="gpt-4o",
126+
messages=messages,
127+
max_tokens=params.max_tokens,
128+
temperature=params.temperature,
129+
)
130+
return types.CreateMessageResult(
131+
role="assistant",
132+
content=types.TextContent(
133+
type="text", text=response.choices[0].message.content or ""
134+
),
135+
model=response.model,
136+
stop_reason="endTurn",
137+
)
138+
```
139+
140+
### Anthropic
141+
142+
```python
143+
from anthropic import AsyncAnthropic
144+
145+
anthropic_client = AsyncAnthropic()
146+
147+
148+
async def handle_sampling(
149+
context: ClientRequestContext,
150+
params: types.CreateMessageRequestParams,
151+
) -> types.CreateMessageResult:
152+
messages = [
153+
{"role": msg.role, "content": msg.content.text}
154+
for msg in params.messages
155+
if isinstance(msg.content, types.TextContent)
156+
]
157+
158+
response = await anthropic_client.messages.create(
159+
model="claude-sonnet-4-20250514",
160+
messages=messages,
161+
max_tokens=params.max_tokens or 1024,
162+
system=params.system_prompt or "",
163+
)
164+
return types.CreateMessageResult(
165+
role="assistant",
166+
content=types.TextContent(
167+
type="text", text=response.content[0].text
168+
),
169+
model=response.model,
170+
stop_reason="endTurn",
171+
)
172+
```
173+
174+
## Complete Example
175+
176+
For a complete working example with both server and client, see:
177+
178+
- **Server**: [`examples/servers/simple-sampling`](https://github.com/modelcontextprotocol/python-sdk/tree/main/examples/servers/simple-sampling)
179+
- **Client**: [`examples/clients/simple-sampling-client`](https://github.com/modelcontextprotocol/python-sdk/tree/main/examples/clients/simple-sampling-client)
180+
181+
## Model Preferences
182+
183+
Servers can provide hints about which model to use via `model_preferences`:
184+
185+
```python
186+
from mcp.types import ModelPreferences, ModelHint
187+
188+
result = await ctx.session.create_message(
189+
messages=[...],
190+
max_tokens=100,
191+
model_preferences=ModelPreferences(
192+
hints=[ModelHint(name="claude-sonnet-4-20250514")],
193+
cost_priority=0.5,
194+
speed_priority=0.8,
195+
intelligence_priority=0.7,
196+
),
197+
)
198+
```
199+
200+
The client can use these hints to select an appropriate model, but is not required to follow them.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Simple Sampling Client
2+
3+
A simple MCP client that demonstrates how to handle **sampling** requests from an MCP server.
4+
5+
## Overview
6+
7+
When an MCP server needs LLM completions during tool execution, it sends a `sampling/createMessage` request to the client. This client provides a `sampling_callback` that handles these requests.
8+
9+
In a real application, the callback would forward the request to an LLM provider (OpenAI, Anthropic, Azure OpenAI, etc.). This example uses a simple demo response for illustration.
10+
11+
## Usage
12+
13+
First, make sure the sampling server is available (install it from `examples/servers/simple-sampling`).
14+
15+
Then run the client:
16+
17+
```bash
18+
uv run mcp-simple-sampling-client
19+
```
20+
21+
## How It Works
22+
23+
1. The client connects to the `mcp-simple-sampling` server via stdio transport.
24+
2. It provides a `sampling_callback` function that handles `sampling/createMessage` requests.
25+
3. When it calls a tool (e.g., `summarize`), the server sends a sampling request back to the client.
26+
4. The client's callback processes the request and returns a response.
27+
5. The server uses that response to complete the tool execution.
28+
29+
## Integrating a Real LLM
30+
31+
To use a real LLM instead of the demo response, replace the body of `handle_sampling` with your LLM call:
32+
33+
```python
34+
from openai import AsyncOpenAI
35+
36+
openai_client = AsyncOpenAI()
37+
38+
async def handle_sampling(
39+
context: ClientRequestContext,
40+
params: types.CreateMessageRequestParams,
41+
) -> types.CreateMessageResult:
42+
messages = []
43+
if params.system_prompt:
44+
messages.append({"role": "system", "content": params.system_prompt})
45+
for msg in params.messages:
46+
if isinstance(msg.content, types.TextContent):
47+
messages.append({"role": msg.role, "content": msg.content.text})
48+
49+
response = await openai_client.chat.completions.create(
50+
model="gpt-4o",
51+
messages=messages,
52+
max_tokens=params.max_tokens,
53+
temperature=params.temperature,
54+
)
55+
return types.CreateMessageResult(
56+
role="assistant",
57+
content=types.TextContent(
58+
type="text", text=response.choices[0].message.content
59+
),
60+
model=response.model,
61+
stop_reason="endTurn",
62+
)
63+
```

examples/clients/simple-sampling-client/mcp_simple_sampling_client/__init__.py

Whitespace-only changes.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from mcp_simple_sampling_client.main import main
2+
3+
main()

0 commit comments

Comments
 (0)