Your MCP server works on your machine. It passes tests. The Inspector shows green. Now you need to deploy it where real users can use it, real load can hit it, and real things can go wrong at 3 AM.
This chapter covers production deployment patterns—from simple single-server setups to enterprise architectures with gateways, registries, and multi-tenant isolation.
The simplest production model: distribute your server as a package and let users run it locally.
# Users install and run with npx
npx -y @yourorg/mcp-server-whateverPackage your server properly:
{
"name": "@yourorg/mcp-server-whatever",
"version": "1.0.0",
"bin": {
"mcp-server-whatever": "./dist/index.js"
},
"files": ["dist/"],
"engines": {
"node": ">=18"
}
}# Users install and run with uvx
uvx mcp-server-whateverSet up pyproject.toml:
[project]
name = "mcp-server-whatever"
version = "1.0.0"
requires-python = ">=3.10"
dependencies = ["mcp>=1.0.0"]
[project.scripts]
mcp-server-whatever = "mcp_server_whatever:main"- Zero infrastructure to manage
- Server runs with user's permissions (appropriate for local tools)
- No authentication needed
- Updates via package manager
- Can't share state between users
- Each user runs their own instance
- No centralized monitoring
- Hard to enforce version consistency
For shared, remote servers, deploy as an HTTP service.
import express from "express";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
const app = express();
// Health check endpoint
app.get("/health", (req, res) => {
res.json({ status: "healthy", version: "1.0.0" });
});
// MCP endpoint
const sessions = new Map();
app.post("/mcp", async (req, res) => {
const sessionId = req.headers["mcp-session-id"];
if (!sessionId || !sessions.has(sessionId)) {
// New session
const server = new McpServer({ name: "prod-server", version: "1.0.0" });
// ... register tools ...
const transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => crypto.randomUUID(),
});
await server.connect(transport);
sessions.set(transport.sessionId, { server, transport });
await transport.handleRequest(req, res);
} else {
// Existing session
const { transport } = sessions.get(sessionId);
await transport.handleRequest(req, res);
}
});
app.listen(process.env.PORT || 3000);FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
EXPOSE 3000
HEALTHCHECK CMD curl -f http://localhost:3000/health || exit 1
CMD ["node", "dist/index.js"]# docker-compose.yml
services:
mcp-server:
build: .
ports:
- "3000:3000"
environment:
- API_KEY=${API_KEY}
- DATABASE_URL=${DATABASE_URL}
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 3
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
spec:
containers:
- name: mcp-server
image: yourorg/mcp-server:latest
ports:
- containerPort: 3000
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: mcp-secrets
key: api-key
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 15
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: mcp-server
spec:
selector:
app: mcp-server
ports:
- port: 80
targetPort: 3000
type: ClusterIPMCP's Streamable HTTP transport is compatible with serverless platforms, especially as the protocol moves toward stateless operation.
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
export async function handler(event) {
const server = new McpServer({ name: "lambda-server", version: "1.0.0" });
// Register tools
server.tool("process_data", "Process data", { input: z.string() }, async ({ input }) => ({
content: [{ type: "text", text: `Processed: ${input}` }],
}));
// Handle the request
const body = JSON.parse(event.body);
// ... process JSON-RPC message and return response
}export default {
async fetch(request: Request): Promise<Response> {
if (request.method === "POST") {
const body = await request.json();
// Handle MCP JSON-RPC request
const response = await handleMcpRequest(body);
return new Response(JSON.stringify(response), {
headers: { "Content-Type": "application/json" },
});
}
return new Response("MCP Server", { status: 200 });
},
};- Cold starts — First request will be slower. Minimize initialization.
- Statelessness — Each invocation is independent. Don't rely on in-memory state.
- Session management — Use external storage (Redis, DynamoDB) for sessions if needed.
- Timeouts — Lambda has a 15-minute max. Most MCP tool calls should be much faster.
- Cost — Pay per invocation. Great for bursty workloads, expensive for constant load.
As MCP deployments grow, organizations need a way to manage, secure, and monitor multiple servers. Enter the MCP gateway.
Client ──→ ┌─────────────┐ ──→ Server A (GitHub)
│ Gateway │ ──→ Server B (Database)
Client ──→ │ │ ──→ Server C (Monitoring)
│ • Auth │
│ • Routing │
Client ──→ │ • Rate limit│ ──→ Server D (File Storage)
│ • Logging │
│ • Caching │
└─────────────┘
A gateway sits between clients and servers, providing:
- Authentication — Verify client identity once, proxy to multiple servers
- Routing — Direct requests to the appropriate backend server
- Rate limiting — Prevent abuse and enforce quotas
- Logging — Centralized audit trail
- Caching — Cache resource reads and tool list responses
- Tool aggregation — Present tools from multiple servers as a single unified server
- Access control — Control which users can access which tools
Several companies offer MCP gateway products:
- Cloudflare has built MCP support into their Workers platform
- Kong and other API gateway vendors are adding MCP support
- Smithery and mcp.run offer hosted MCP server registries with gateway features
For many teams, a simple reverse proxy with authentication is sufficient. You don't need a dedicated MCP gateway until you have many servers, many users, or complex access control requirements.
When multiple users share an MCP server, you need tenant isolation.
Identify the user on each request and scope operations:
@mcp.tool()
async def list_documents(ctx: Context) -> str:
"""List the current user's documents."""
user_id = ctx.request_context.get("user_id") # From auth middleware
docs = await db.query("SELECT * FROM documents WHERE owner_id = ?", user_id)
return format_documents(docs)Create isolated server instances per session:
app.post("/mcp", async (req, res) => {
const userId = await authenticateRequest(req);
const sessionKey = `${userId}:${req.headers["mcp-session-id"]}`;
if (!sessions.has(sessionKey)) {
// Create a new server instance scoped to this user
const server = createServerForUser(userId);
sessions.set(sessionKey, server);
}
await sessions.get(sessionKey).handleRequest(req, res);
});For maximum isolation, give each tenant their own database:
def get_db_for_user(user_id: str) -> Connection:
return connect(f"postgres://host/{user_id}_db")- Request rate — Tool calls per second, by tool name
- Latency — P50, P95, P99 for tool execution
- Error rate — Percentage of tool calls that return errors
- Active sessions — Number of connected clients
- Resource usage — CPU, memory, connections per server
app.get("/health", (req, res) => {
const checks = {
server: "healthy",
database: checkDatabase(),
externalApi: checkExternalApi(),
uptime: process.uptime(),
version: "1.0.0",
};
const isHealthy = Object.values(checks).every(
(v) => v === "healthy" || typeof v === "number" || typeof v === "string"
);
res.status(isHealthy ? 200 : 503).json(checks);
});import structlog
logger = structlog.get_logger()
@mcp.tool()
async def query_data(sql: str, ctx: Context) -> str:
logger.info(
"tool_call",
tool="query_data",
sql_length=len(sql),
session_id=ctx.session_id,
)
start = time.time()
try:
result = await execute_query(sql)
duration = time.time() - start
logger.info(
"tool_success",
tool="query_data",
duration_ms=duration * 1000,
row_count=len(result),
)
return format_result(result)
except Exception as e:
duration = time.time() - start
logger.error(
"tool_error",
tool="query_data",
error=str(e),
duration_ms=duration * 1000,
)
raiseFor stateless servers, scaling is straightforward—add more instances behind a load balancer. For stateful servers (with sessions), you need either:
- Sticky sessions — Route requests from the same session to the same instance
- Shared session store — Store session state in Redis/Memcached
- Stateless design — Avoid server-side session state entirely
Each stdio connection is a process. Each HTTP connection consumes memory. Plan capacity accordingly:
- stdio: Limit the number of concurrent server processes
- HTTP: Use connection pooling and set reasonable timeouts
- WebSocket/SSE: Monitor open connection counts
Cache aggressively:
- Tool lists change infrequently → cache with TTL
- Resource reads may be cacheable → check freshness with subscriptions
- Prompt templates rarely change → cache indefinitely
- Check logs (stderr for stdio, application logs for HTTP)
- Verify dependencies are installed
- Check environment variables
- Try running manually from the command line
- Check permissions (file access, network, ports)
- Profile tool execution (is the tool slow or the transport?)
- Check external dependencies (API calls, database queries)
- Look for N+1 query patterns
- Consider caching frequently-requested data
- Check resource contention (CPU, memory, connections)
- Monitor memory usage over time
- Check for unclosed connections or file handles
- Watch for growing collections (session maps, caches without TTL)
- Use profiling tools (Node.js:
--inspect, Python:tracemalloc)
When external dependencies fail:
@mcp.tool()
async def get_data(query: str) -> str:
try:
return await primary_source.query(query)
except ConnectionError:
try:
return await cache.get(query)
except CacheMiss:
return "Error: Data source temporarily unavailable. Please try again in a few minutes."Production MCP deployment ranges from simple package distribution to complex multi-tenant architectures. Key considerations:
- Local distribution (npm/PyPI) for single-user tools
- HTTP deployment (Docker/K8s/serverless) for shared servers
- Gateways for managing fleets of servers
- Multi-tenant isolation for shared infrastructure
- Monitoring and observability for operational health
- Scaling through horizontal replication and caching
The right architecture depends on your scale, security requirements, and operational maturity. Start simple (local distribution), grow as needed (hosted HTTP), and add complexity (gateways, multi-tenancy) only when you have the problems that justify it.
Next: a tour of the MCP ecosystem.