@amplitude/ai

Agent analytics for Amplitude. Track every LLM call, user message, tool call, and quality signal as events in your Amplitude project — then build funnels, cohorts, and retention charts across AI and product behavior.

npm install @amplitude/ai @amplitude/analytics-node

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ amplitude: ai, apiKey: process.env.OPENAI_API_KEY });
const agent = ai.agent('my-agent');

app.post('/chat', async (req, res) => {
  const session = agent.session({ userId: req.userId, sessionId: req.sessionId });

  const result = await session.run(async (s) => {
    s.trackUserMessage(req.body.message);
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: req.body.messages,
    });
    return response.choices[0].message.content;
  });

  await ai.flush();
  res.json({ response: result });
});
// Events: [Agent] User Message, [Agent] AI Response (with model, tokens, cost, latency),
//         [Agent] Session Start, [Agent] Session End — all tied to userId and sessionId

How to Get Started

Instrument with a coding agent (recommended)

npm install @amplitude/ai
npx amplitude-ai

The CLI prints a prompt to paste into any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.):

Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md

The agent reads the guide, scans your project, discovers your agents and LLM call sites, and instruments everything — provider wrappers, session lifecycle, multi-agent delegation, tool tracking, scoring, and a verification test. You review and approve each step.

Manual setup

Whether you use a coding agent or set up manually, the goal is the same: full instrumentation — agents + sessions + provider wrappers. This gives you every event type, per-user analytics, and server-side enrichment.

Follow the code example above to get started. The pattern is:

Swap your LLM import — import { OpenAI } from '@amplitude/ai' (or Anthropic, Gemini, etc.)
Create an agent — ai.agent('my-agent') to name and track your AI component
Wrap in a session — agent.session({ userId, sessionId }).run(async (s) => { ... }) for per-user analytics, funnels, cohorts, and server-side enrichment
Track user messages — s.trackUserMessage(...) for conversation context
Score responses — s.score(...) for quality measurement

patch() exists for quick verification or legacy codebases where you can't modify call sites, but it only captures [Agent] AI Response without user identity — no funnels, no cohorts, no retention. Start with full instrumentation; fall back to patch() only if you can't modify call sites.

Property	Value
Name	@amplitude/ai
Version	0.3.10
Runtime	Node.js
Peer dependency	@amplitude/analytics-node >= 1.3.0
Optional peers	openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime, @pydantic/genai-prices (cost), tiktoken or js-tiktoken (token counting)

How to Get Started
- Instrument with a Coding Agent (recommended)
Installation
Quick Start
- Current Limitations
- Is this for me?
- Why this SDK?
- What you can build
What You Get at Each Level
Core Concepts
- User Identity
- Session
Configuration
Context Dict Conventions
Privacy & Content Control
Cache-Aware Cost Tracking
Semantic Cache Tracking
Model Tier Classification
Provider Wrappers
Streaming Tracking
Attachment Tracking
Implicit Feedback
tool() and observe() HOFs
Scoring Patterns
Enrichments
Debug and Dry-Run Modes
Patching
Auto-Instrumentation CLI
Integrations
Data Flow
Integration Approaches
Integration Patterns
Serverless Environments
Error Handling and Reliability
Testing
Troubleshooting
Context Propagation
Middleware
Bulk Conversation Import
Event Schema
Event Property Reference
Event JSON Examples
Sending Events Without the SDK
Register Event Schema in Your Data Catalog
Utilities and Type Exports
Constants
API Reference
For AI Coding Agents
For Python SDK Migrators
Need Help?
Contributing
License

Installation

npm install @amplitude/ai @amplitude/analytics-node

Install provider SDKs based on what you use (for example: openai, @anthropic-ai/sdk, @google/generative-ai, @mistralai/mistralai, @aws-sdk/client-bedrock-runtime).

Quick Start

5-minute quick start

Install: npm install @amplitude/ai @amplitude/analytics-node
Get your API key: In Amplitude, go to Settings > Projects and copy the API key.
Instrument: Run npx amplitude-ai and paste the printed prompt into your AI coding agent. Or follow the manual setup steps — the goal is the same: agents + sessions + provider wrappers.
Set your API key in the generated .env file and replace the placeholder userId/sessionId.
Run your app. You should see [Agent] User Message, [Agent] AI Response, and [Agent] Session End within 30 seconds.

To verify locally before checking Amplitude, add debug: true:

const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({ debug: true }),
});
// Prints: [amplitude-ai] [Agent] AI Response | model=gpt-4o | tokens=847 | cost=$0.0042 | latency=1,203ms

Tip: Call enableLivePriceUpdates() at startup so cost tracking stays accurate when new models are released. See Cache-Aware Cost Tracking.

Current Limitations

Area	Status
Runtime	Node.js only (no browser). Python SDK available separately (amplitude-ai on PyPI).
Zero-code patching	OpenAI, Anthropic, Azure OpenAI, Gemini, Mistral, Bedrock (Converse/ConverseStream only).
CrewAI	Python-only; the Node.js export throws `ProviderError` by design. Use LangChain or OpenTelemetry integrations instead.
OTEL scope filtering	Not yet supported (Python SDK has `allowed_scopes`/`blocked_scopes`).
Streaming cost tracking	Automatic for OpenAI and Anthropic. Manual token counts required for other providers' streamed responses.

Is this for me?

Yes, if you're building an AI-powered feature (chatbot, copilot, agent, RAG pipeline) and you want to measure how it impacts real user behavior. AI events land in the same Amplitude project as your product events, so you can build funnels from "user asks a question" to "user converts," create cohorts of users with low AI quality scores, and measure retention without stitching data across tools.

Already using an LLM observability tool? Keep it. The OTEL bridge adds Amplitude as a second destination in one line. Your existing traces stay, and you get product analytics on top.

Why this SDK?

Most AI observability tools give you traces. This SDK gives you per-turn events that live in your product analytics so you can:

Build funnels from "user opens chat" through "AI responds" to "user converts"
Create cohorts of users with low AI quality scores and measure their 7-day retention
Answer "is this AI feature helping or hurting?" without moving data between tools

The structural difference is the event model. Trace-centric tools typically produce spans per LLM call. This SDK produces one event per conversation turn with 40+ properties: model, tokens, cost, latency, reasoning, implicit feedback signals (regeneration, copy, abandonment), cache breakdowns, agent hierarchy, and experiment context. Each event is independently queryable in Amplitude's charts, cohorts, funnels, and retention analysis.

Every AI event carries your product user_id. No separate identity system, no data joining required. Build a funnel from "user opens chat" to "AI responds" to "user upgrades" directly in Amplitude.

Server-side enrichment does the evals for you. When content is available (contentMode: 'full'), Amplitude's enrichment pipeline runs automatically on every session after it closes. You get topic classifications, quality rubrics, behavioral flags, and session outcomes without writing or maintaining any eval code. Define your own topics and scoring rubrics; the pipeline applies them to every session automatically. Results appear as [Agent] Score events with rubric scores, [Agent] Topic Classification events with category labels, and [Agent] Session Evaluation summaries, all queryable in charts, cohorts, and funnels alongside your product events.

Quality signals from every source in one event type. User thumbs up/down (source: 'user'), automated rubric scores from the enrichment pipeline (source: 'ai'), and reviewer assessments (source: 'reviewer') all produce [Agent] Score events differentiated by [Agent] Evaluation Source. One chart shows all three side by side. Filter by source or view them together. Filter by [Agent] Agent ID for per-agent quality attribution.

Three content-control tiers. full sends content and Amplitude runs enrichments for you. metadata_only sends zero content (you still get cost, latency, tokens, session grouping). customer_enriched sends zero content but lets you provide your own structured labels via trackSessionEnrichment().

Cache-aware cost tracking. Pass cacheReadTokens and cacheCreationTokens for accurate blended costs. Without this breakdown, naive cost calculation can overestimate by 2-5x for cache-heavy workloads.

What you can build

Once AI events are in Amplitude alongside your product events:

Cohorts. "Users who had 3+ task failures in the last 30 days." "Users with low task completion scores." Target them with Guides, measure churn impact.
Funnels. "AI session about charts -> Chart Created." "Sign Up -> First AI Session -> Conversion." Measure whether AI drives feature adoption and onboarding.
Retention. Do users with successful AI sessions retain better than those with failures? Segment retention curves by [Agent] Overall Outcome or task completion score.
Agent analytics. Compare quality, cost, and failure rate across agents in one chart. Identify which agent in a multi-agent chain introduced a failure.

How quality measurement works

The SDK captures quality signals at three layers, from most direct to most comprehensive:

1. Explicit user feedback — Instrument thumbs up/down, star ratings, or CSAT scores via trackScore(). Each call produces an [Agent] Score event with source: 'user':

ai.trackScore({
  userId: 'u1', name: 'user-feedback', value: 1,
  targetId: aiMessageId, targetType: 'message', source: 'user',
});

2. Implicit behavioral signals — The SDK auto-tracks behavioral proxies for quality on every turn, with zero additional instrumentation:

Signal	Property	Event	Interpretation
Copy	`[Agent] Was Copied`	`[Agent] AI Response`	User copied the output — positive
Regeneration	`[Agent] Is Regeneration`	`[Agent] User Message`	User asked for a redo — negative
Edit	`[Agent] Is Edit`	`[Agent] User Message`	User refined their prompt — friction
Abandonment	`[Agent] Abandonment Turn`	`[Agent] Session End`	User left after N turns — potential failure

3. Automated server-side evaluation — When contentMode: 'full', Amplitude's enrichment pipeline runs LLM-as-judge evaluators on every session after it closes. No eval code to write or maintain:

Rubric	What it measures	Scale
`task_completion`	Did the agent accomplish what the user asked?	0–2
`response_quality`	Was the response clear, accurate, and helpful?	0–2
`user_satisfaction`	Did the user seem satisfied based on conversation signals?	0–2
`agent_confusion`	Did the agent misunderstand or go off track?	0–2

Plus boolean detectors: negative_feedback (frustration phrases), task_failure (agent failed to deliver), data_quality_issues, and behavioral_patterns (clarification loops, topic drift). All results are emitted as [Agent] Score events with source: 'ai'.

All three layers use the same [Agent] Score event type, differentiated by [Agent] Evaluation Source ('user', 'ai', or 'reviewer'). One chart shows user feedback alongside automated evals. No joins, no separate tables.

What You Set vs What You Get

You set	Where it comes from	What you unlock
API key	Amplitude project settings	Events reach Amplitude
userId	Your auth layer (JWT, session cookie, API token)	Per-user analytics, cohorts, retention
agentId	Your choice (e.g. `'chat-handler'`)	Per-agent cost, latency, quality dashboards
sessionId	Your conversation/thread/ticket ID	Multi-turn analysis, session enrichment, quality scores
description	Your choice (e.g. `'Handles support queries via GPT-4o'`)	Human-readable agent registry from event streams
contentMode + redactPii	Config (defaults work)	Server enrichment (automatic), PII scrubbing
model, tokens, cost	Auto-captured by provider wrappers	Cost analytics, latency monitoring
parentAgentId	Auto via `child()`/`runAs()`	Multi-agent hierarchy
env, agentVersion, context	Your deploy pipeline	Segmentation, regression detection

Italicized rows require zero developer effort — they're automatic or have sensible defaults.

The minimum viable setup is 4 fields: API key, userId, agentId, sessionId. Everything else is either automatic or a progressive enhancement.

What You Get at Each Level

The coding agent workflow defaults to full instrumentation — the top row below. Lower levels exist as fallbacks, not as recommended starting points.

Level	Events you get	What it unlocks in Amplitude
Full (agents + sessions + wrappers)	User Message, AI Response, Tool Call, Session Start/End, Score, Enrichments	Per-user funnels, cohorts, retention, session replay linking, quality scoring
Wrappers only (no sessions)	AI Response (with cost, tokens, latency)	Aggregate cost monitoring, model comparison
`patch()` only (no wrappers, no sessions)	AI Response (basic)	Aggregate call counts — useful for verification only

Support matrix

Fully supported in Node.js: OpenAI chat completions, OpenAI Responses API, Azure OpenAI chat completions, Anthropic messages, Gemini, Mistral, Bedrock, LangChain, OpenTelemetry, LlamaIndex.
Partial support: zero-code patch() is best-effort by installed SDK and provider surface; OpenAI Agents tracing depends on incoming span payload shape from the host SDK.
Not currently supported in Node.js:
- AmplitudeCrewAIHooks is Python-only and throws in Node.js.

Parity and runtime limitations

This section is the source of truth for behavior that is intentionally different from Python due to runtime constraints:

AmplitudeCrewAIHooks is unsupported in Node.js (CrewAI is Python-only).
tool() does not auto-generate JSON Schema from runtime type hints; pass inputSchema explicitly.
Tool timeout behavior is async Promise.race based and cannot preempt synchronous CPU-bound code.
Auto-instrument bootstrap differs by runtime (node --import in Node vs sitecustomize in Python).
Request middleware differs by runtime (Express-compatible in Node vs ASGI middleware in Python).

Zero-code (for verification or legacy codebases)

patch() monkey-patches provider SDKs so existing LLM calls are tracked without code changes. This is useful for verifying the SDK works or for legacy codebases where you can't modify call sites. It only captures [Agent] AI Response without user identity — for the full event model, use agents + sessions (see Quick Start).

import { AmplitudeAI, patch } from '@amplitude/ai';
// OpenAI/Azure OpenAI chat completions (+ parse), OpenAI Responses, Anthropic, Gemini, Mistral,
// and Bedrock Converse calls are tracked when patching succeeds.
// No changes to your existing code needed.
import OpenAI from 'openai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
patch({ amplitudeAI: ai });

const openai = new OpenAI();

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
});
// ^ automatically tracked as [Agent] AI Response

Warning: Patched calls that fire outside an active session context are silently dropped — no event is emitted and no error is thrown. If you instrument with patch() but see no events, this is the most likely cause. Wrap your LLM calls in session.run(), use the Express middleware, or pass context explicitly. See Session and Middleware.

Or use the CLI to auto-patch at process start without touching application code:

AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js

Wrap (recommended for production)

Replace the provider constructor with the Amplitude-instrumented version for automatic tracking with full control over options per call:

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
  amplitude: ai,
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();

await session.run(async () => {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: 'Hello' }],
  });
  // AI response tracked automatically via wrapper

  const responseV2 = await openai.responses.create({
    model: 'gpt-4.1',
    instructions: 'You are concise.',
    input: [{ role: 'user', content: 'Summarize this in one sentence.' }],
  });
  // OpenAI Responses API is also tracked automatically
});

Or wrap an existing client instance (supports OpenAI, Azure OpenAI, and Anthropic):

import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);

All provider constructors and wrap() accept either an AmplitudeAI instance or a raw Amplitude client — both work:

new OpenAI({ amplitude: ai }); // AmplitudeAI instance
new OpenAI({ amplitude: ai.amplitude }); // raw Amplitude client
wrap(client, ai); // AmplitudeAI instance
wrap(client, ai.amplitude); // raw Amplitude client

Note: wrap() only supports OpenAI, Azure OpenAI, and Anthropic clients. For Gemini, Mistral, and Bedrock, use the SDK's provider classes directly (e.g., new Gemini({ amplitude: ai })).

Full control

Call tracking methods directly for maximum flexibility. Works with any LLM provider, including custom or self-hosted models:

import { AmplitudeAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session({ userId: 'user-123' });

await session.run(async (s) => {
  s.trackUserMessage('Summarize this document');

  const start = performance.now();
  const response = await myCustomLLM.generate('Summarize this document');
  const latencyMs = performance.now() - start;

  s.trackAiMessage(response.text, 'my-model-v2', 'custom', latencyMs, {
    inputTokens: response.usage.input,
    outputTokens: response.usage.output,
  });
});

Core Concepts

AmplitudeAI

Main client that wraps Amplitude analytics-node. Create it with an API key or an existing Amplitude instance:

const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY' });
// Or with existing client:
const ai = new AmplitudeAI({ amplitude: existingAmplitudeClient });

BoundAgent

Agent with pre-bound defaults (agentId, description, userId, env, etc.). Use agent() to create:

const agent = ai.agent('support-bot', {
  description: 'Handles customer support queries via OpenAI GPT-4o',
  userId: 'user-123',
  env: 'production',
  customerOrgId: 'org-456',
});

Child agents inherit context from their parent and automatically set parentAgentId (note: description is agent-specific and is not inherited — pass it explicitly if needed):

const orchestrator = ai.agent('orchestrator', {
  description: 'Routes queries to specialized child agents',
  userId: 'user-123',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer', {
  description: 'Drafts responses using retrieved context',
});
// researcher.parentAgentId === 'orchestrator'
// researcher inherits orchestrator's description; writer has its own

TenantHandle

Multi-tenant helper that pre-binds customerOrgId for all agents created from it:

const tenant = ai.tenant('org-456', { env: 'production' });
const agent = tenant.agent('support-bot', { userId: 'user-123' });

User Identity

User identity flows through the session, per-call, or middleware -- not at agent creation or patch time. This keeps the agent reusable across users.

Via sessions (recommended): pass userId when opening a session:

const agent = ai.agent('support-bot', { env: 'production' });
const session = agent.session({ userId: 'user-42' });

await session.run(async (s) => {
  s.trackUserMessage('Hello');
  // userId inherited from session context
});

Per-call: pass userId on each tracking call (useful with the zero-code tier):

agent.trackUserMessage('Hello', {
  userId: 'user-42',
  sessionId: 'sess-1',
});

Via middleware: createAmplitudeAIMiddleware extracts user identity from the request (see Middleware):

app.use(
  createAmplitudeAIMiddleware({
    amplitudeAI: ai,
    userIdResolver: (req) => req.headers['x-user-id'] ?? null,
  }),
);

Session

Async context manager using AsyncLocalStorage. Use session.run() to execute a callback within session context; session end is tracked automatically on exit:

const session = agent.session({ userId: 'user-123' });
await session.run(async (s) => {
  s.trackUserMessage('Hello');
  s.trackAiMessage(response.content, 'gpt-4', 'openai', latencyMs);
});

Start a new trace within an ongoing session to group related operations:

await session.run(async (s) => {
  const traceId = s.newTrace();
  s.trackUserMessage('Follow-up question');
  s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs);
});

For sessions where gaps between messages may exceed 30 minutes (e.g., coding assistants, support agents waiting on customer replies), pass idleTimeoutMinutes so Amplitude knows the session is still active:

const session = agent.session({
  userId: 'user-123',
  idleTimeoutMinutes: 240, // expect up to 4-hour gaps
});

Without this, sessions with long idle periods may be closed and enrichment may run earlier than expected. The default is 30 minutes.

Session lifecycle and enrichment. You do not need to call trackSessionEnd() for sessions to work. Amplitude's server automatically closes sessions after 30 minutes of inactivity and queues them for enrichment (topic classification, quality scoring, session evaluation) at that point. The only reason to call trackSessionEnd() is to trigger enrichment sooner — for example, if you know the conversation is over and want evaluation results immediately rather than waiting for the idle timeout.

"Closed" is a server-side concept meaning "queued for enrichment" — it does not prevent new events from flowing into the same session. If the user resumes a conversation after session end, new messages with the same sessionId are still associated with that session.

If you use session.run(), session end is tracked automatically when the callback completes. For long-lived conversations (chatbots, support agents), you can skip explicit session end entirely and let the server handle it.

Link to Session Replay: If your frontend uses Amplitude's Session Replay, pass the browser's deviceId and browserSessionId to link AI sessions to browser recordings:

const session = agent.session({
  userId: 'user-123',
  deviceId: req.headers['x-amp-device-id'],
  browserSessionId: req.headers['x-amp-session-id'],
});

await session.run(async (s) => {
  s.trackUserMessage('What is retention?');
  // All events now carry [Amplitude] Session Replay ID = deviceId/browserSessionId
});

tool()

Higher-order function wrapping functions to auto-track as [Agent] Tool Call events:

import { tool } from '@amplitude/ai';

const searchDb = tool(
  async (query: { q: string }) => {
    return await db.search(query.q);
  },
  {
    name: 'search_db',
    inputSchema: { type: 'object', properties: { q: { type: 'string' } } },
  },
);

Note on inputSchema: Unlike the Python SDK which accepts a Pydantic model class and extracts the JSON Schema automatically, the TypeScript SDK accepts a raw JSON Schema object. For type-safe schema generation, consider using Zod with zod-to-json-schema:

import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';

const QuerySchema = z.object({ q: z.string(), limit: z.number().optional() });
const searchDb = tool(mySearchFn, {
  name: 'search_db',
  inputSchema: zodToJsonSchema(QuerySchema),
});

observe()

Higher-order function wrapping functions to auto-track as [Agent] Span events:

import { observe } from '@amplitude/ai';

const processRequest = observe(
  async (input: Request) => {
    return await handleRequest(input);
  },
  { name: 'process_request' },
);

Configuration

import { AIConfig, AmplitudeAI, ContentMode } from '@amplitude/ai';

const config = new AIConfig({
  contentMode: ContentMode.FULL, // FULL | METADATA_ONLY | CUSTOMER_ENRICHED — both ContentMode.FULL and 'full' work
  redactPii: true,
  customRedactionPatterns: ['sensitive-\\d+'],
  debug: false,
  dryRun: false,
});

const ai = new AmplitudeAI({ apiKey: 'YOUR_API_KEY', config });

Option	Description
`contentMode`	`'full'` (default), `'metadata_only'`, or `'customer_enriched'`. Both `ContentMode.FULL` and `'full'` work.
`redactPii`	Redact email, phone, SSN, and credit-card patterns from tracked content. Defaults to `true` — set to `false` to opt out.
`customRedactionPatterns`	Additional regex patterns for redaction
`debug`	Log events to stderr
`dryRun`	Log without sending to Amplitude
`validate`	Enable strict validation of required fields
`onEventCallback`	Callback invoked after every tracked event `(event, statusCode, message) => void`
`propagateContext`	Enable cross-service context propagation

Context Dict Conventions

The context parameter on ai.agent() accepts an arbitrary Record<string, unknown> that is JSON-serialized and attached to every event as [Agent] Context. This is the recommended way to add segmentation dimensions without requiring new global properties.

Recommended keys:

Key	Example Values	Use Case
`agent_type`	`"planner"`, `"executor"`, `"retriever"`, `"router"`	Filter/group analytics by agent role in multi-agent systems.
`experiment_variant`	`"control"`, `"treatment-v2"`, `"prompt-rewrite-a"`	Segment AI sessions by A/B test variant. Compare quality scores, abandonment rates, or cost across experiment arms.
`feature_flag`	`"new-rag-pipeline"`, `"reasoning-model-enabled"`	Track which feature flags were active during the session.
`surface`	`"chat"`, `"search"`, `"copilot"`, `"email-draft"`	Identify which UI surface or product area triggered the AI interaction.
`prompt_revision`	`"v7"`, `"abc123"`, `"2026-02-15"`	Track which prompt version was used. Detect prompt regression when combined with `agentVersion`.
`deployment_region`	`"us-east-1"`, `"eu-west-1"`	Segment by deployment region for latency analysis or compliance tracking.
`canary_group`	`"canary"`, `"stable"`	Identify canary vs. stable deployments for progressive rollout monitoring.

Example:

const agent = ai.agent('support-bot', {
  userId: 'u1',
  description: 'Handles customer support queries via OpenAI GPT-4o',
  agentVersion: '4.2.0',
  context: {
    agent_type: 'executor',
    experiment_variant: 'reasoning-enabled',
    surface: 'chat',
    feature_flag: 'new-rag-pipeline',
    prompt_revision: 'v7',
  },
});

// All events from this agent (and its sessions, child agents, and provider
// wrappers) will include [Agent] Context with these keys.

Context merging in child agents:

const parent = ai.agent('orchestrator', {
  context: { experiment_variant: 'treatment', surface: 'chat' },
});
const child = parent.child('researcher', {
  context: { agent_type: 'retriever' },
});
// child context = { experiment_variant: 'treatment', surface: 'chat', agent_type: 'retriever' }
// Child keys override parent keys; parent keys absent from the child are preserved.

Querying in Amplitude: The [Agent] Context property is a JSON string. Use Amplitude's JSON property parsing to extract individual keys for charts, cohorts, and funnels. For example, group by [Agent] Context.agent_type to see metrics by agent role.

Note on experiment_variant and server-generated events: Context keys appear on all SDK-emitted events ([Agent] User Message, [Agent] AI Response, etc.). Server-generated events ([Agent] Session Evaluation, [Agent] Score with source="ai") do not yet inherit context keys. To segment server-generated quality scores by experiment arm, use Amplitude Derived Properties to extract from [Agent] Context on SDK events.

Privacy & Content Control

Three content modes control what data is sent to Amplitude:

Mode	Message Content	Token/Cost/Latency	Session Grouping	Server Enrichments
`FULL`	Sent (with PII redaction)	Yes	Yes	Yes (auto)
`METADATA_ONLY`	Not sent	Yes	Yes	No
`CUSTOMER_ENRICHED`	Not sent	Yes	Yes	Yes (you provide)

FULL mode (default)

Message content is captured and sent to Amplitude. PII redaction is on by default — built-in patterns scrub emails, phone numbers, SSNs, credit card numbers, and base64 image data before the event leaves your process. Set redactPii: false to opt out:

const config = new AIConfig({
  contentMode: ContentMode.FULL,
  redactPii: true, // default; pass false to disable
});

With the default redactPii: true, a message like "Contact me at john@example.com or 555-123-4567" is sanitized to "Contact me at [email] or [phone]" before being sent.

Built-in phone and SSN detection are currently tuned for common US formats. If you need broader international coverage, add explicit customRedactionPatterns for your locales.

Add custom redaction patterns for domain-specific PII:

const config = new AIConfig({
  contentMode: ContentMode.FULL,
  redactPii: true,
  customRedactionPatterns: ['ACCT-\\d{6,}', 'internal-key-[a-f0-9]+'],
});

Custom redaction patterns are your responsibility: avoid expensive or catastrophic regexes in performance-sensitive paths.

Message content is stored at full length with no truncation or size limits. The $llm_message property is whitelisted server-side, and the Node SDK does not apply per-property string truncation.

METADATA_ONLY mode

No message content is sent. You still get token counts, cost, latency, model name, and session grouping — everything needed for cost analytics and performance monitoring:

const config = new AIConfig({
  contentMode: ContentMode.METADATA_ONLY,
});

Use this when you cannot send user content to a third-party analytics service (e.g., regulated industries, sensitive data).

CUSTOMER_ENRICHED mode

Like METADATA_ONLY (no content sent), but designed for workflows where you enrich sessions with your own classifications, quality scores, and topic labels via the SessionEnrichments API:

const config = new AIConfig({
  contentMode: ContentMode.CUSTOMER_ENRICHED,
});

// Later, after running your own classification pipeline:
const enrichments = new SessionEnrichments({
  qualityScore: 0.85,
  overallOutcome: 'resolved',
});
session.setEnrichments(enrichments);

PrivacyConfig (advanced)

PrivacyConfig is derived from AIConfig via config.toPrivacyConfig(). For advanced use, create directly:

import { PrivacyConfig } from '@amplitude/ai';

const privacy = new PrivacyConfig({
  privacyMode: true,
  redactPii: true,
  customRedactionPatterns: ['sensitive-\\d+'],
});

When to use which mode

FULL: You want to see actual conversation content in Amplitude, debug individual sessions, and leverage server-side enrichment pipelines. Best for development, internal tools, and applications where data sharing agreements permit it.
METADATA_ONLY: You want cost/performance analytics without exposing any message content. Best for regulated environments (healthcare, finance) or when content contains proprietary data.
CUSTOMER_ENRICHED: You want the privacy of METADATA_ONLY but also want structured analytics (topic classification, quality scores) that you compute on your own infrastructure before sending to Amplitude.

Cache-Aware Cost Tracking

When using provider prompt caching (Anthropic's cache, OpenAI's cached completions, etc.), pass cache token breakdowns for accurate cost calculation:

s.trackAiMessage(
  response.content,
  'claude-3.5-sonnet',
  'anthropic',
  latencyMs,
  {
    inputTokens: response.usage.input_tokens,
    outputTokens: response.usage.output_tokens,
    cacheReadTokens: response.usage.cache_read_input_tokens,
    cacheCreationTokens: response.usage.cache_creation_input_tokens,
  },
);

Without cache breakdowns, cost calculation treats all input tokens at the standard rate. With caching enabled, cache-read tokens are typically 10x cheaper than standard input tokens and cache-creation tokens are ~25% more expensive. Naive cost calculation without this breakdown can overestimate costs by 2-5x for cache-heavy workloads.

The SDK tracks four token categories:

[Agent] Input Tokens — standard (non-cached) input tokens
[Agent] Output Tokens — generated output tokens
[Agent] Cache Read Tokens — tokens read from provider cache (cheap)
[Agent] Cache Creation Tokens — tokens written to provider cache (slightly expensive)

Cost is auto-calculated when token counts are provided and the @pydantic/genai-prices package is installed. When genai-prices is not available, calculateCost() returns 0 (never null). You can also pass totalCostUsd directly if you compute cost yourself:

s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
  totalCostUsd: 0.0034,
});

Note — pricing data freshness. Cost calculation relies on pricing data bundled in the installed @pydantic/genai-prices package. Newly released models may return $0 until the package is updated. To get the latest pricing between package releases, opt in to live updates at startup:
import { enableLivePriceUpdates } from '@amplitude/ai';
enableLivePriceUpdates(); // fetches latest prices from genai-prices GitHub repo hourly
This makes periodic HTTPS requests to raw.githubusercontent.com (~26 KB each). Only enable in environments where outbound network access is permitted.

Semantic Cache Tracking

Track full-response semantic cache hits (distinct from token-level prompt caching above):

s.trackAiMessage(cachedResponse.content, 'gpt-4o', 'openai', latencyMs, {
  wasCached: true, // served from Redis/semantic cache
});

Maps to [Agent] Was Cached. Enables "cache hit rate" charts and cost optimization analysis. Only emitted when true; omitted (not false) when the response was not cached.

Model Tier Classification

Models are automatically classified into tiers for cost/performance analysis:

Tier	Examples	When to Use
`fast`	gpt-4o-mini, claude-3-haiku, gemini-flash, gpt-3.5-turbo	High-volume, latency-sensitive
`standard`	gpt-4o, claude-3.5-sonnet, gemini-pro, llama, command	General purpose
`reasoning`	o1, o3-mini, deepseek-r1, claude with extended thinking	Complex reasoning tasks

The tier is inferred automatically from the model name and attached as [Agent] Model Tier on every [Agent] AI Response event:

import {
  inferModelTier,
  TIER_FAST,
  TIER_REASONING,
  TIER_STANDARD,
} from '@amplitude/ai';

inferModelTier('gpt-4o-mini'); // 'fast'
inferModelTier('claude-3.5-sonnet'); // 'standard'
inferModelTier('o1-preview'); // 'reasoning'

Override the auto-inferred tier for custom or fine-tuned models:

s.trackAiMessage(
  response.content,
  'ft:gpt-4o:my-org:custom',
  'openai',
  latencyMs,
  {
    modelTier: 'standard',
    inputTokens: response.usage.prompt_tokens,
    outputTokens: response.usage.completion_tokens,
  },
);

Provider Wrappers

Use instrumented provider wrappers for automatic tracking:

Provider	Class	Package
OpenAI	`OpenAI`	openai
Anthropic	`Anthropic`	@anthropic-ai/sdk
Gemini	`Gemini`	@google/generative-ai
AzureOpenAI	`AzureOpenAI`	openai
Bedrock	`Bedrock`	@aws-sdk/client-bedrock-runtime
Mistral	`Mistral`	@mistralai/mistralai

Feature coverage by provider:

Feature	OpenAI	Anthropic	Gemini	AzureOpenAI	Bedrock	Mistral
Streaming	Yes	Yes	Yes	Yes	Yes	Yes
Tool call tracking	Yes	Yes	No	Yes	Yes	No
TTFB measurement	Yes	Yes	No	Yes	No	No
Cache token stats	Yes	Yes	No	No	No	No
Responses API	Yes	-	-	-	-	-
Reasoning content	Yes	Yes	No	Yes	No	No
System prompt capture	Yes	Yes	Yes	Yes	Yes	Yes
Cost estimation	Yes	Yes	Yes	Yes	Yes	Yes

Provider wrappers use injected TrackFn callbacks instead of class hierarchy casts, enabling easier composition and custom tracking logic.

Bedrock model IDs like us.anthropic.claude-3-5-sonnet are automatically normalized for price lookup (e.g., to claude-3-5-sonnet).

OpenAI example:

import { AmplitudeAI, OpenAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({
  amplitude: ai,
  apiKey: process.env.OPENAI_API_KEY,
});

const agent = ai.agent('my-agent', { userId: 'user-123' });
const session = agent.session();

await session.run(async (s) => {
  const resp = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello' }],
  });
  // AI response tracked automatically via wrapper
});

Or wrap an existing client:

import { wrap } from '@amplitude/ai';
import OpenAI from 'openai';

const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const instrumented = wrap(client, ai);

Streaming Tracking

Automatic streaming (provider wrappers)

Provider wrappers (OpenAI, AzureOpenAI, Anthropic, Gemini, Mistral, Bedrock) automatically detect supported streaming responses and track them transparently. The wrapper intercepts the AsyncIterable, accumulates chunks, measures TTFB, and emits an [Agent] AI Response event after the stream is fully consumed:

const openai = new OpenAI({ amplitude: ai, apiKey: '...' });

// Streaming is handled automatically — just iterate the result
const stream = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content ?? '');
}
// ^ AI Response event emitted automatically after loop ends

Manual streaming

Track streaming responses manually with time-to-first-byte (TTFB) for latency analysis:

s.trackAiMessage(fullContent, 'gpt-4o', 'openai', totalMs, {
  isStreaming: true,
  ttfbMs: timeToFirstByte,
  inputTokens: usage.prompt_tokens,
  outputTokens: usage.completion_tokens,
});

The SDK tracks two timing properties for streaming:

[Agent] Latency Ms — total wall-clock time from request to final chunk
[Agent] TTFB Ms — time-to-first-byte, the delay before the first token arrives

StreamingAccumulator

For manual streaming, use StreamingAccumulator to collect chunks and automatically measure TTFB:

import { StreamingAccumulator } from '@amplitude/ai';

const accumulator = new StreamingAccumulator();

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    accumulator.addContent(content);
  }
}

accumulator.setUsage({
  inputTokens: finalUsage.prompt_tokens,
  outputTokens: finalUsage.completion_tokens,
});

s.trackAiMessage(
  accumulator.content,
  'gpt-4o',
  'openai',
  accumulator.elapsedMs,
  {
    isStreaming: true,
    ttfbMs: accumulator.ttfbMs,
    inputTokens: accumulator.inputTokens,
    outputTokens: accumulator.outputTokens,
    finishReason: accumulator.finishReason,
  },
);

The accumulator automatically records TTFB when addContent() is called for the first time, and tracks total elapsed time via elapsedMs. For streaming errors, call setError(message) to set isError and errorMessage, which are included on the tracked AI Response event.

Attachment Tracking

Track files sent with user messages (images, PDFs, URLs):

s.trackUserMessage('Analyze this document', {
  attachments: [
    { type: 'image', name: 'chart.png', size_bytes: 102400 },
    { type: 'pdf', name: 'report.pdf', size_bytes: 2048576 },
  ],
});

The SDK automatically derives aggregate properties from the attachment array:

[Agent] Has Attachments — boolean, true when attachments are present
[Agent] Attachment Count — number of attachments
[Agent] Attachment Types — deduplicated list of attachment types (e.g., ["image", "pdf"])
[Agent] Total Attachment Size Bytes — sum of all size_bytes values
[Agent] Attachments — serialized JSON of the full attachment metadata

Attachments can also be tracked on AI responses (e.g., when the model generates images or files):

s.trackAiMessage(response.content, 'gpt-4o', 'openai', latencyMs, {
  attachments: [{ type: 'image', name: 'generated.png', size_bytes: 204800 }],
});

Implicit Feedback

Track behavioral signals that indicate whether a response met the user's need, without requiring explicit ratings:

// User asks a question
s.trackUserMessage('How do I create a funnel?');

// AI responds — user copies the answer (positive signal)
s.trackAiMessage('To create a funnel, go to...', 'gpt-4o', 'openai', latencyMs, {
  wasCopied: true,
});

// User regenerates (negative signal — first response wasn't good enough)
s.trackUserMessage('How do I create a funnel?', {
  isRegeneration: true,
});

// User edits their question (refining intent)
s.trackUserMessage('How do I create a conversion funnel for signups?', {
  isEdit: true,
  editedMessageId: originalMsgId, // links the edit to the original
});

Track abandonment at session end — a low abandonmentTurn (e.g., 1) strongly signals first-response dissatisfaction:

agent.trackSessionEnd({
  sessionId: 'sess-1',
  abandonmentTurn: 1, // user left after first AI response
});

These signals map to [Agent] Was Copied, [Agent] Is Regeneration, [Agent] Is Edit, [Agent] Edited Message ID, and [Agent] Abandonment Turn. Use them in Amplitude to build quality dashboards without requiring user surveys.

tool() and observe() HOFs

tool()

Wraps an async function to track as [Agent] Tool Call:

import { tool, ToolCallTracker } from '@amplitude/ai';

ToolCallTracker.setAmplitude(ai.amplitude, 'user-123', {
  sessionId: 'sess-1',
  traceId: 'trace-1',
  agentId: 'my-agent',
  privacyConfig: ai.config.toPrivacyConfig(),
});

const fetchWeather = tool(
  async (args: { city: string }) => {
    return await weatherApi.get(args.city);
  },
  {
    name: 'fetch_weather',
    inputSchema: { type: 'object', properties: { city: { type: 'string' } } },
    timeoutMs: 5000,
    onError: (err, name) => console.error(`Tool ${name} failed:`, err),
  },
);

observe()

Wraps a function to track as [Agent] Span:

import { observe } from '@amplitude/ai';

const enrichData = observe(async (data: unknown) => transform(data), {
  name: 'enrich_data',
  agentId: 'enricher',
});

Scoring Patterns

Track quality feedback from multiple sources using the score() method. Scores are emitted as [Agent] Score events.

User Feedback (thumbs up/down)

s.score('thumbs-up', 1, messageId, { source: 'user' });
s.score('thumbs-down', 0, messageId, { source: 'user' });

Numeric Rating

s.score('rating', 4, messageId, {
  source: 'user',
  comment: 'Very helpful but slightly verbose',
});

LLM-as-Judge

s.score('quality', 0.85, messageId, {
  source: 'ai',
  comment: 'Clear and accurate response with proper citations',
});

Session-Level Scoring

Score an entire session rather than a single message by setting targetType to 'session':

s.score('session-quality', 0.9, session.sessionId, {
  targetType: 'session',
  source: 'ai',
});

Score Properties

Each [Agent] Score event includes:

[Agent] Score Name — the name you provide (e.g., "thumbs-up", "quality")
[Agent] Score Value — numeric value
[Agent] Target ID — the message ID or session ID being scored
[Agent] Target Type — "message" (default) or "session"
[Agent] Evaluation Source — "user" (default) or "ai"
[Agent] Comment — optional free-text comment (respects content mode)

Enrichments

Session Enrichments

Attach structured metadata to sessions for analytics. Enrichments are included when the session auto-ends:

import {
  RubricScore,
  SessionEnrichments,
  TopicClassification,
} from '@amplitude/ai';

const enrichments = new SessionEnrichments({
  qualityScore: 0.85,
  sentimentScore: 0.7,
  overallOutcome: 'resolved',
  topicClassifications: {
    intent: new TopicClassification({
      l1: 'billing',
      primary: 'billing',
      values: ['billing', 'refund'],
      subcategories: ['REFUND_REQUEST', 'PRICING_QUESTION'],
    }),
  },
  rubrics: [
    new RubricScore({
      name: 'helpfulness',
      score: 4,
      rationale: 'Provided clear step-by-step instructions',
    }),
    new RubricScore({
      name: 'accuracy',
      score: 5,
      rationale: 'All information was factually correct',
    }),
  ],
  agentChain: ['orchestrator', 'researcher', 'writer'],
  rootAgentName: 'orchestrator',
  requestComplexity: 'medium',
});

session.setEnrichments(enrichments);
// Enrichments are included automatically when session.run() completes

Track Enrichments Separately

Send enrichments as a standalone event without ending the session:

agent.trackSessionEnrichment(enrichments, {
  sessionId: 'sess-abc123',
});

End-to-End Example: `customer_enriched` Mode

This mode is for teams that run their own evaluation pipeline (or can't send message content to Amplitude) but still want rich session-level analytics. Here's a complete workflow:

import {
  AIConfig,
  AmplitudeAI,
  ContentMode,
  MessageLabel,
  RubricScore,
  SessionEnrichments,
  TopicClassification,
} from '@amplitude/ai';

// 1. Configure: no content sent to Amplitude
const ai = new AmplitudeAI({
  apiKey: process.env.AMPLITUDE_AI_API_KEY!,
  config: new AIConfig({
    contentMode: ContentMode.CUSTOMER_ENRICHED,
  }),
});

const agent = ai.agent('support-bot', {
  description: 'Handles support conversations in metadata-only mode',
  agentVersion: '2.1.0',
});

// 2. Run the conversation — content is NOT sent (metadata only)
const session = agent.session({ userId: 'user-42' });
const { sessionId, messageIds } = await session.run(async (s) => {
  const msgIds: string[] = [];
  msgIds.push(s.trackUserMessage('Why was I charged twice?'));
  msgIds.push(
    s.trackAiMessage(
      aiResponse.content,
      'gpt-4o',
      'openai',
      latencyMs,
    ),
  );
  return { sessionId: s.sessionId, messageIds: msgIds };
});

// 3. Run your eval pipeline on the raw messages (e.g., your own LLM judge)
const evalResults = await myEvalPipeline(conversationHistory);

// 4. Ship enrichments back to Amplitude
const enrichments = new SessionEnrichments({
  qualityScore: evalResults.quality,
  sentimentScore: evalResults.sentiment,
  overallOutcome: evalResults.outcome,
  topicClassifications: {
    'billing': new TopicClassification({
      topic: 'billing-dispute',
      confidence: 0.92,
    }),
  },
  rubricScores: [
    new RubricScore({ name: 'accuracy', score: 4, maxScore: 5 }),
    new RubricScore({ name: 'helpfulness', score: 5, maxScore: 5 }),
  ],
  messageLabels: {
    [messageIds[0]]: [
      new MessageLabel({ key: 'intent', value: 'billing-dispute', confidence: 0.94 }),
    ],
  },
  customMetadata: { eval_model: 'gpt-4o-judge-v2' },
});

agent.trackSessionEnrichment(enrichments, { sessionId });

This produces the same Amplitude event properties as Amplitude's built-in server-side enrichment (topics, rubrics, outcomes, message labels), but sourced from your pipeline. Use it when compliance requires zero-content transmission, or when you need custom evaluation logic beyond what the built-in enrichment provides.

Available Enrichment Fields

Quality & Sentiment: qualityScore, sentimentScore
Outcome: overallOutcome, hasTaskFailure, taskFailureType, taskFailureReason
Topics: topicClassifications — a map of taxonomy name to TopicClassification
Rubrics: rubrics — array of RubricScore with name, score, rationale, and evidence
Failure Signals: hasNegativeFeedback, hasDataQualityIssues, hasTechnicalFailure
Error Analysis: errorCategories, technicalErrorCount
Behavioral: behavioralPatterns, negativeFeedbackPhrases, dataQualityIssues
Agent Topology: agentChain, rootAgentName
Complexity: requestComplexity
Labels: messageLabels — per-message labels keyed by message ID
Custom: customMetadata — arbitrary key/value data for your own analytics

Message Labels

Attach classification labels to individual messages within a session. Labels are flexible key-value pairs for filtering and segmentation in Amplitude.

Common use cases: routing tags (flow, surface), classifier output (intent, sentiment, toxicity), business context (tier, plan).

Inline labels (at tracking time):

import { MessageLabel } from '@amplitude/ai';

s.trackUserMessage('I want to cancel my subscription', {
  labels: [
    new MessageLabel({
      key: 'intent',
      value: 'cancellation',
      confidence: 0.95,
    }),
    new MessageLabel({
      key: 'sentiment',
      value: 'frustrated',
      confidence: 0.8,
    }),
  ],
});

Retrospective labels (after the session, from a background pipeline):

When classifier results arrive after the session ends, attach them via SessionEnrichments.messageLabels, keyed by the messageId returned from tracking calls:

import { MessageLabel, SessionEnrichments } from '@amplitude/ai';

const enrichments = new SessionEnrichments({
  messageLabels: {
    [userMsgId]: [
      new MessageLabel({ key: 'intent', value: 'cancellation', confidence: 0.94 }),
    ],
    [aiMsgId]: [
      new MessageLabel({ key: 'quality', value: 'good', confidence: 0.91 }),
    ],
  },
});

agent.trackSessionEnrichment(enrichments, { sessionId: 'sess-abc123' });

Labels are emitted as [Agent] Message Labels on the event. In Amplitude, filter or group by label key/value to build charts like "messages by intent" or "sessions where flow=onboarding".

Debug and Dry-Run Modes

Debug Mode

Prints a colored (ANSI) summary of every tracked event to stderr. All 8 event types (User Message, AI Response, Tool Call, Embedding, Span, Session End, Session Enrichment, Score) are formatted. Events are still sent to Amplitude:

const ai = new AmplitudeAI({
  apiKey: 'xxx',
  config: new AIConfig({ debug: true }),
});

// stderr output for each event:
// [amplitude-ai] [Agent] AI Response | user=user-123 session=sess-abc agent=my-agent model=gpt-4o latency=1203ms tokens=150→847 cost=$0.0042
// [amplitude-ai] [Agent] Tool Call | user=user-123 session=sess-abc agent=my-agent tool=search_db success=true latency=340ms
// [amplitude-ai] [Agent] User Message | user=user-123 session=sess-abc agent=my-agent

Dry-Run Mode

Logs the full event JSON to stderr WITHOUT sending to Amplitude. Events are never transmitted:

const ai = new AmplitudeAI({
  apiKey: 'xxx',
  config: new AIConfig({ dryRun: true }),
});

// stderr: full JSON of each event
// Useful for local development, CI pipelines, and validating event shape

Environment Variable Configuration

Both modes can be enabled via environment variables when using auto-instrumentation:

AMPLITUDE_AI_DEBUG=true amplitude-ai-instrument node app.js

Patching

Monkey-patch provider SDKs to auto-track without changing call sites. This is useful for quick verification that the SDK is connected, or for legacy codebases where modifying call sites is impractical. For the full event model (user messages, sessions, scoring, enrichments), use agents + sessions as shown in Quick Start.

import {
  AmplitudeAI,
  patch,
  patchOpenAI,
  unpatch,
  unpatchOpenAI,
} from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });

// Patch installed/available providers (OpenAI, Anthropic, Gemini, Mistral, Bedrock)
patch({ amplitudeAI: ai });

// Or patch specific provider
patchOpenAI({ amplitudeAI: ai });

// Unpatch
unpatch();
unpatchOpenAI();

Available patch functions: patchOpenAI, patchAnthropic, patchAzureOpenAI, patchGemini, patchMistral, patchBedrock. Corresponding unpatch for each: unpatchOpenAI, unpatchAnthropic, unpatchAzureOpenAI, unpatchGemini, unpatchMistral, unpatchBedrock.

patch() returns a string[] of providers where at least one supported surface was successfully patched (e.g., ['openai', 'anthropic']), matching the Python SDK's return signature.

Automatic tool call extraction: patch() automatically extracts [Agent] Tool Call events from LLM message arrays — no manual trackToolCall() needed for basic tool tracking. For OpenAI Chat Completions, it scans role: "assistant" messages with tool_calls arrays and correlates with role: "tool" result messages. For OpenAI Responses API, it extracts type: "function_call" and type: "function_call_output" entries. For Anthropic Messages, it scans type: "tool_use" content blocks in assistant messages and correlates with type: "tool_result" blocks in subsequent user messages.

Real tool-call latency: when tool execution happens between two patched LLM calls in the same session, the SDK measures latency from the timestamp at which the assistant emitted the tool_use / tool_call block to when its result is sent back on the next turn. Tool uses whose result never appears in a subsequent turn (or appear after the 10-minute TTL) fall back to latencyMs: 0.

Declaring expected providers (optional)

If you already know which providers your app uses (for example from static config or feature flags), you can pass them to patch() as an optional sanity check. The SDK logs a one-time warning if the runtime-patched set drifts from what you declared — extra or missing providers — and continues patching either way:

patch({
  amplitudeAI: ai,
  expectedProviders: ['openai', 'anthropic'],
  appKey: 'my-app', // optional; used to deduplicate warnings per app
});

This is purely a guardrail against accidental drift (e.g. a dependency quietly switching providers). It never blocks patching and is safe to omit.

Patch surface notes:

OpenAI/Azure OpenAI: chat.completions.create, chat.completions.parse, and Responses APIs are instrumented (including streaming shapes where exposed by the SDK).
Bedrock: only ConverseCommand and ConverseStreamCommand are instrumented when patching client.send.

Auto-Instrumentation CLI

Preload the register module to auto-patch providers at process start:

AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true amplitude-ai-instrument node app.js

Or directly with Node's ESM preload flag:

AMPLITUDE_AI_API_KEY=xxx AMPLITUDE_AI_AUTO_PATCH=true node --import @amplitude/ai/register app.js

Environment variables:

Variable	Description
`AMPLITUDE_AI_API_KEY`	Required for auto-patch
`AMPLITUDE_AI_AUTO_PATCH`	Must be `"true"` to enable
`AMPLITUDE_AI_CONTENT_MODE`	`full`, `metadata_only`, or `customer_enriched`
`AMPLITUDE_AI_DEBUG`	`"true"` for debug output to stderr

Doctor CLI

Validate setup (env, provider deps, mock event capture, mock flush path):

amplitude-ai doctor

Useful flags:

amplitude-ai doctor --no-mock-check

Status

Show the installed SDK version, detected provider packages, and environment variable configuration at a glance:

amplitude-ai status

Shell Completions

Enable tab-completion for all CLI commands and flags:

# bash
eval "$(amplitude-ai-completions bash)"

# zsh
eval "$(amplitude-ai-completions zsh)"

MCP Server

Run the SDK-local MCP server over stdio:

amplitude-ai mcp

MCP surface:

Tool	Description
`scan_project`	Scan project structure, detect providers, frameworks, and multi-agent patterns
`validate_file`	Analyze a source file to detect uninstrumented LLM call sites
`instrument_file`	Apply instrumentation transforms to a source file
`generate_verify_test`	Generate a dry-run verification test using MockAmplitudeAI
`get_event_schema`	Return the full event schema and property definitions
`get_integration_pattern`	Return canonical instrumentation code patterns
`validate_setup`	Check env vars and dependency presence
`suggest_instrumentation`	Context-aware next steps based on your framework and provider
`search_docs`	Full-text search across SDK documentation (README, llms-full.txt)

Resources: amplitude-ai://event-schema, amplitude-ai://integration-patterns, amplitude-ai://instrument-guide

Prompt: instrument_app — guided walkthrough for instrumenting an application

Examples and AI Coding Agent Guide

amplitude-ai.md — self-contained instrumentation guide for any AI coding agent (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.). Run npx amplitude-ai to see the prompt that points your agent to this file.
Mock-based examples demonstrating the event model (also used as CI smoke tests):
- examples/zero-code.ts
- examples/wrap-openai.ts
- examples/multi-agent.ts
- examples/framework-integration.ts
Real provider examples (require API keys):
- examples/real-openai.ts — end-to-end OpenAI integration with session tracking and flush
- examples/real-anthropic.ts — end-to-end Anthropic integration with session tracking and flush

Integrations

LangChain

import { AmplitudeAI, AmplitudeCallbackHandler } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const handler = new AmplitudeCallbackHandler({
  amplitudeAI: ai,
  userId: 'user-123',
  sessionId: 'sess-1',
});

// Pass handler to LangChain callbacks

OpenTelemetry

Two exporters add Amplitude as a destination alongside your existing trace backend (Datadog, Honeycomb, Jaeger, etc.):

import {
  AmplitudeAgentExporter,
  AmplitudeGenAIExporter,
} from '@amplitude/ai';
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import {
  BatchSpanProcessor,
  SimpleSpanProcessor,
} from '@opentelemetry/sdk-trace-base';

const provider = new NodeTracerProvider();

// GenAI exporter — converts gen_ai.* spans into Amplitude AI events
provider.addSpanProcessor(
  new BatchSpanProcessor(
    new AmplitudeGenAIExporter({
      apiKey: process.env.AMPLITUDE_AI_API_KEY!,
    }),
  ),
);

// Agent exporter — converts agent.* spans into Amplitude session events
provider.addSpanProcessor(
  new SimpleSpanProcessor(
    new AmplitudeAgentExporter({
      apiKey: process.env.AMPLITUDE_AI_API_KEY!,
    }),
  ),
);

provider.register();

Only spans with gen_ai.provider.name or gen_ai.system attributes are processed; all other spans are silently ignored. This means it's safe to add the exporter to a pipeline that produces mixed (GenAI + HTTP + DB) spans.

Attribute mapping reference:

OTEL Span Attribute	Amplitude Event Property	Notes
`gen_ai.response.model` / `gen_ai.request.model`	`[Agent] Model`	Response model preferred
`gen_ai.system` / `gen_ai.provider.name`	`[Agent] Provider`
`gen_ai.usage.input_tokens`	`[Agent] Input Tokens`
`gen_ai.usage.output_tokens`	`[Agent] Output Tokens`
`gen_ai.usage.total_tokens`	`[Agent] Total Tokens`	Derived if not present
`gen_ai.usage.cache_read.input_tokens`	`[Agent] Cache Read Tokens`
`gen_ai.usage.cache_creation.input_tokens`	`[Agent] Cache Creation Tokens`
`gen_ai.request.temperature`	`[Agent] Temperature`
`gen_ai.request.top_p`	`[Agent] Top P`
`gen_ai.request.max_output_tokens`	`[Agent] Max Output Tokens`
`gen_ai.response.finish_reasons`	`[Agent] Finish Reason`
`gen_ai.input.messages`	`[Agent] LLM Message`	Only if content mode allows
Span duration	`[Agent] Latency Ms`
Span status ERROR	`[Agent] Is Error`, `[Agent] Error Message`

Not available via OTEL (use native wrappers): reasoning content/tokens, TTFB, streaming detection, implicit feedback, file attachments, event graph linking (parent_message_id).

When to use OTEL vs. native wrappers: If you already have @opentelemetry/instrumentation-openai or similar producing GenAI spans, the OTEL bridge gives you Amplitude analytics with zero code changes. For richer tracking (implicit feedback, streaming metrics, attachments), use the native wrapOpenAI()/wrapAnthropic() wrappers alongside OTEL.

LlamaIndex

import {
  AmplitudeLlamaIndexHandler,
  createAmplitudeLlamaIndexHandler,
} from '@amplitude/ai';

OpenAI Agents SDK

import { AmplitudeTracingProcessor } from '@amplitude/ai';

Anthropic Tool Use

import { AmplitudeToolLoop } from '@amplitude/ai';

Managed Agents (Anthropic)

For managed / hosted agent architectures where LLM calls happen server-side and you only receive results via API:

import { ManagedAgentTracker } from '@amplitude/ai/integrations/anthropic-managed';

const tracker = new ManagedAgentTracker(session, { provider: 'anthropic' });
tracker.trackTurn(sessionEventsFromAPI);

See examples/anthropic-managed-agents-example.ts and the coding agent guide (amplitude-ai.md, Step 3f) for full usage.

Claude Agent SDK

Track tool calls with execution latency and AI messages from Claude Agent SDK.

Essential fields: agentId (on ai.agent()) identifies which AI feature produced the events — it maps to the LLM Usage Application Registry. userId + sessionId (on agent.session()) tie all events into a single user conversation, powering funnels, retention, and conversation views. The session automatically emits [Agent] Session Start and [Agent] Session End.

import { AmplitudeAI } from '@amplitude/ai';
import { ClaudeAgentSDKTracker } from '@amplitude/ai/integrations/claude-agent-sdk';

const ai = new AmplitudeAI({ apiKey: 'YOUR_KEY' });
const agent = ai.agent({ agentId: 'code-reviewer' });
const tracker = new ClaudeAgentSDKTracker();

await agent.session({ userId: 'u1', sessionId: 'sess-abc' }).run(async (s) => {
  for await (const message of query({
    prompt: 'Analyze this codebase',
    options: { hooks: tracker.hooks(s) },
  })) {
    tracker.process(s, message);
  }
});

hooks(session) returns PreToolUse/PostToolUse hooks for ClaudeAgentOptions that track tool execution with precise latency. process(session, message) processes messages from the query() stream to track AI responses and user messages.

CrewAI (Python-only)

import { AmplitudeCrewAIHooks } from '@amplitude/ai';

In Node.js, AmplitudeCrewAIHooks throws a ProviderError by design. Use LangChain or OpenTelemetry integrations instead.

Data Flow

How events flow from your application to Amplitude charts:

Your Application
├── wrapOpenAI() / wrapAnthropic()     ─── auto-emits ──┐
├── session.trackUserMessage()         ─── manual ──────┤
├── session.trackAiMessage()           ─── manual ──────┤
├── agent.trackToolCall()              ─── manual ──────┤
├── agent.trackSessionEnrichment()     ─── manual ──────┤
└── OTEL exporter (AmplitudeGenAI...)  ─── bridge ──────┤
                                                        │
                              AmplitudeAI client ◄──────┘
                                   │
                                   ├── validate (if enabled)
                                   ├── apply middleware chain
                                   ├── batch events
                                   │
                                   ▼
                           Amplitude HTTP API
                                   │
                     ┌─────────────┴──────────────┐
                     │                            │
            Amplitude Charts               LLM Enrichment
            (immediate querying)           Pipeline (async)
                                                  │
                                                  ▼
                                        [Agent] Session Evaluation
                                        [Agent] Score events
                                        (topic, rubric, outcome)

Key points:

All paths converge at the AmplitudeAI client, which batches and sends events.
Events are available for charting within seconds of ingestion.
The LLM Enrichment Pipeline runs asynchronously after session close (only when contentMode: 'full'). It produces server-side events like [Agent] Session Evaluation and [Agent] Score.
With contentMode: 'customer_enriched', the enrichment pipeline is skipped — you provide your own enrichments via trackSessionEnrichment().

Integration Approaches

Start with full instrumentation. Use agents + sessions + provider wrappers. This is the recommended approach for both coding agent and manual workflows — it gives you every event type, per-user analytics, and server-side enrichment.

Approach	When to use	What you get
Full control (recommended)	Any project, new or existing	`BoundAgent` + `session.run()` + provider wrappers — all event types, per-user funnels, cohorts, retention, quality scoring, enrichments
Express/Fastify middleware	Web app, auto-session per request	Same as full control with automatic session lifecycle via `createAmplitudeAIMiddleware`
Swap import	Existing codebase, incremental adoption	`new OpenAI({ amplitude: ai })` — auto-tracking per call, add sessions when ready
Wrap	You've already created a client	`wrap(client, ai)` — instruments an existing client instance
Managed / hosted agents	Anthropic Managed Agents, OpenAI Assistants, agent-as-a-service	Manual `trackUserMessage` + `trackAiMessage` + `trackToolCall` with tokens/cost from the API response, or `ManagedAgentTracker` adapter
Zero-code / `patch()`	Verification or legacy codebases only	`patch({ amplitudeAI: ai })` — `[Agent] AI Response` only, no user identity, no funnels
Claude Agent SDK hooks	Apps using Claude Agent SDK `query()`	`ClaudeAgentSDKTracker` — real tool latency via PreToolUse/PostToolUse hooks, plus AI response and user message tracking
OTEL Bridge	Third-party framework exports OTEL spans	Add exporter to existing OTEL pipeline — limited to OTEL attributes

The first four approaches all support the full event model. Choose based on how you want to integrate — the analytics capabilities are the same. patch() is the exception: it only captures aggregate [Agent] AI Response events without user identity, useful only for verifying the SDK works or for codebases where you can't modify call sites.

User text, turn-level events, and gateways

These rules match the Python amplitude-ai agent guide and affect how Agent Analytics labels sessions and computes costs:

trackUserMessage(content, opts?) — The content string becomes $llm_message.text. Use a short, human-readable line for the real user intent (or a headless summary). Put large JSON, RAG packs, or pipeline state in opts.context or opts.eventProperties, not as the only content, or session titles and segmentation will show raw JSON.
Turn-level vs spans — [Agent] User Message and [Agent] AI Response (with session + turn ids) drive turn counts and conversation views. observe() / trackSpan() add trace detail but do not replace those turn events; keep a user + AI pair for each user-visible cycle unless you intentionally document otherwise.
Gateways / custom baseURL — If you use stock openai (or another client) against a proxy, the SDK may not auto-wrap that path. Call trackAiMessage with usage token fields from the response (or stream end), pass the actual routed model id as the model argument, and set totalCostUsd if genai-prices cannot resolve the model string. Install @pydantic/genai-prices for automatic USD estimates when model + tokens are known.

Integration Patterns

Pattern A: Single-Request API Endpoint

For serverless functions or API endpoints that handle one request at a time. The key requirement is flushing events before the handler returns:

import { AmplitudeAI } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });

app.post('/chat', async (req, res) => {
  const agent = ai.agent('api-handler', { userId: req.userId });
  const session = agent.session({ sessionId: req.sessionId });

  const result = await session.run(async (s) => {
    s.trackUserMessage(req.body.message);

    const start = performance.now();
    const response = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: req.body.messages,
    });
    const latencyMs = performance.now() - start;

    s.trackAiMessage(
      response.choices[0].message.content ?? '',
      'gpt-4o',
      'openai',
      latencyMs,
      {
        inputTokens: response.usage?.prompt_tokens,
        outputTokens: response.usage?.completion_tokens,
      },
    );

    return response.choices[0].message.content;
  });

  await ai.flush();
  res.json({ response: result });
});

Pattern B: Long-Lived Session (Chatbot)

For multi-turn conversations where the session spans many request/response cycles. Create the session once and reuse it across turns:

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const agent = ai.agent('chatbot', { userId: 'user-123', env: 'production' });

// Session persists across multiple turns
const session = agent.session({ sessionId: conversationId });

await session.run(async (s) => {
  // Turn 1
  s.trackUserMessage('What is Amplitude?');
  const resp1 = await llm.chat('What is Amplitude?');
  s.trackAiMessage(resp1.content, 'gpt-4o', 'openai', resp1.latencyMs, {
    inputTokens: resp1.usage.input,
    outputTokens: resp1.usage.output,
  });

  // Turn 2
  s.trackUserMessage('How does it track events?');
  const resp2 = await llm.chat('How does it track events?');
  s.trackAiMessage(resp2.content, 'gpt-4o', 'openai', resp2.latencyMs, {
    inputTokens: resp2.usage.input,
    outputTokens: resp2.usage.output,
  });

  // Score the conversation
  s.score('helpfulness', 0.9, session.sessionId, {
    targetType: 'session',
    source: 'ai',
  });
});
// Session auto-ends here with all enrichments

Pattern C: Multi-Agent Orchestration

For architectures where a parent agent delegates to specialized child agents. Use session.runAs() to automatically propagate the child agent's identity to both manual tracking calls and provider wrappers:

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY!, amplitude: ai });

const orchestrator = ai.agent('orchestrator', {
  userId: 'user-123',
  env: 'production',
});
const researcher = orchestrator.child('researcher');
const writer = orchestrator.child('writer');

const session = orchestrator.session({ userId: 'user-123' });

await session.run(async (s) => {
  s.trackUserMessage('Write a blog post about TypeScript generics');

  // Research phase — provider calls automatically tagged with agentId='researcher'
  const researchResult = await s.runAs(researcher, async (rs) => {
    const completion = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: 'Research TypeScript generics' }],
    });
    return completion.choices[0].message.content;
  });

  // Writing phase — provider calls automatically tagged with agentId='writer'
  const draft = await s.runAs(writer, async (ws) => {
    const completion = await openai.chat.completions.create({
      model: 'gpt-4o',
      messages: [{ role: 'user', content: `Write a post using: ${researchResult}` }],
    });
    return completion.choices[0].message.content;
  });

  s.trackAiMessage(draft ?? '', 'gpt-4o', 'openai', totalLatencyMs, {
    inputTokens: totalInput,
    outputTokens: totalOutput,
  });
});

// Events emitted:
//   [Agent] User Message     → agentId='orchestrator'
//   [Agent] AI Response       → agentId='researcher',  parentAgentId='orchestrator'
//   [Agent] AI Response       → agentId='writer',      parentAgentId='orchestrator'
//   [Agent] AI Response       → agentId='orchestrator'
//   [Agent] Session End       → agentId='orchestrator'  (one session end, not per-child)

How runAs works:

Shares the parent session's sessionId, traceId, and turn counter
Overrides agentId and parentAgentId in AsyncLocalStorage for the callback's duration
Provider wrappers automatically read the child's identity — no amplitudeOverrides needed
Does not emit [Agent] Session End (the child operates within the parent session)
Restores the parent context when the callback completes, even on errors
Supports nesting: s.runAs(child, (cs) => cs.runAs(grandchild, ...))

Serverless Environments

The SDK auto-detects serverless environments (Vercel, AWS Lambda, Netlify, Google Cloud Functions, Azure Functions, Cloudflare Pages). When detected, session.run() automatically flushes all pending events before the promise resolves — no explicit ai.flush() needed. You can also control this explicitly via the autoFlush option on session():

// Auto-detected: flushes automatically in serverless, skips in long-running servers
agent.session({ userId, sessionId });

// Explicit control:
agent.session({ userId, sessionId, autoFlush: true });   // always flush
agent.session({ userId, sessionId, autoFlush: false });  // never flush

If you track events outside of session.run(), you still need await ai.flush() before your handler returns:

export async function handler(event: APIGatewayEvent) {
  const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });
  const agent = ai.agent('api-handler', {
    userId: event.requestContext.authorizer?.userId,
  });

  const session = agent.session();

  const result = await session.run(async (s) => {
    s.trackUserMessage(JSON.parse(event.body ?? '{}').message ?? '');

    const start = performance.now();
    const response = await callLLM(JSON.parse(event.body ?? '{}').message);
    const latencyMs = performance.now() - start;

    s.trackAiMessage(response.content, response.model, 'openai', latencyMs, {
      inputTokens: response.usage.prompt_tokens,
      outputTokens: response.usage.completion_tokens,
    });

    return response.content;
  });

  await ai.flush(); // Without this, events may be lost
  return { statusCode: 200, body: JSON.stringify({ response: result }) };
}

Flush vs Shutdown

ai.flush() — sends all buffered events and returns a promise. Use in serverless handlers and API endpoints where you need to ensure delivery before responding.
ai.shutdown() — flushes and closes the underlying Amplitude client. Only needed if you created the client via apiKey (not when passing your own instance). Call on process exit (e.g., SIGTERM handler).

process.on('SIGTERM', () => {
  ai.shutdown();
  process.exit(0);
});

Error Handling and Reliability

Non-throwing: All track* methods catch and log errors internally. Your application code is never interrupted by tracking failures.
Buffering: Events are buffered and sent in batches by the underlying @amplitude/analytics-node SDK.
Retry: Failed sends are automatically retried by the transport layer.
Validation: Enable validate: true in AIConfig to get early validation errors for missing required fields (userId, sessionId, etc.). Validation errors throw ValidationError so you can catch them during development.
Graceful degradation: If the Amplitude service is unreachable, events are silently dropped after retries are exhausted. Your LLM application continues operating normally.

import { AIConfig, AmplitudeAI, ValidationError } from '@amplitude/ai';

const ai = new AmplitudeAI({
  apiKey: 'xxx',
  config: new AIConfig({ validate: true }),
});

try {
  ai.trackUserMessage({ userId: '', content: 'Hello', sessionId: 'sess-1' });
} catch (e) {
  if (e instanceof ValidationError) {
    console.error('Invalid tracking call:', e.message);
    // "userId must be a non-empty string, got "
  }
}

Testing

Use MockAmplitudeAI for unit tests:

import { MockAmplitudeAI } from '@amplitude/ai';

const mock = new MockAmplitudeAI();

const agent = mock.agent('test-agent', { userId: 'user-1' });
const session = agent.session({ sessionId: 'sess-1', userId: 'user-1' });

await session.run(async (s) => {
  s.trackUserMessage('Hello');
  s.trackAiMessage('Hi!', 'gpt-4', 'openai', 100);
});

mock.assertEventTracked('[Agent] User Message', { userId: 'user-1' });
mock.assertEventTracked('[Agent] AI Response', { userId: 'user-1' });
mock.assertSessionClosed('sess-1');

mock.reset();

Troubleshooting

Symptom	Cause	Fix
No events in Amplitude	API key not set or incorrect	Run `amplitude-ai doctor` — it checks `AMPLITUDE_AI_API_KEY` and reports a fix command
Events tracked but `[Agent] Cost USD` is $0	Model not in the pricing database, or `total_cost_usd` not passed	Pass `totalCostUsd` explicitly, or check that `@pydantic/genai-prices` / `genai-prices` is installed
`patch()` doesn't instrument calls	`patch()` called after the provider client was created	Call `patch()` before importing or instantiating provider clients
Session context missing on events	LLM calls made outside `session.run()`	Wrap your LLM calls inside `session.run(async () => { ... })`
`flush()` hangs or times out in serverless	Process exits before flush completes	Use `await ai.flush()` before returning from your Lambda/Cloud Function handler
`wrap()` TypeScript type errors	Passing a non-supported client type	`wrap()` only supports OpenAI, AzureOpenAI, and Anthropic clients; use provider classes for others
`MockAmplitudeAI` events are empty	Tracking calls not inside a session context	Use `mock.agent(...).session(...).run(...)` to wrap tracked calls
`Cannot find module 'openai'` in Turbopack/Webpack	Bundler rewrites `import.meta.url`, breaking dynamic `require()`	Pass the provider module directly: `new OpenAI({ amplitude: ai, apiKey, openaiModule: OpenAISDK })`. Same pattern for `Anthropic`, `Gemini`, etc. See each provider's `<name>Module` option.

Run amplitude-ai doctor for automated environment diagnostics with fix suggestions.

Context Propagation

For distributed tracing, inject context into outgoing request headers and extract on the receiving side:

import { randomUUID } from 'node:crypto';
import {
  extractContext,
  injectContext,
  runWithContextAsync,
  SessionContext,
} from '@amplitude/ai';

// Outgoing request
const headers = injectContext();
fetch(url, { headers });

// Receiving side
const extracted = extractContext(req.headers);
const ctx = new SessionContext({
  sessionId: extracted.sessionId ?? randomUUID(),
  traceId: extracted.traceId ?? null,
  userId: extracted.userId ?? null,
});

await runWithContextAsync(ctx, async () => {
  // Context available via getActiveContext()
});

Middleware

Express-compatible middleware for automatic session tracking:

import { randomUUID } from 'node:crypto';
import { AmplitudeAI, createAmplitudeAIMiddleware } from '@amplitude/ai';
import express from 'express';

const ai = new AmplitudeAI({ apiKey: process.env.AMPLITUDE_AI_API_KEY! });

const app = express();
app.use(
  createAmplitudeAIMiddleware({
    amplitudeAI: ai,
    userIdResolver: (req) =>
      (req as { headers: { 'x-user-id'?: string } }).headers['x-user-id'] ??
      null,
    sessionIdResolver: (req) =>
      (req as { headers: { 'x-session-id'?: string } }).headers[
        'x-session-id'
      ] ?? randomUUID(),
    agentId: 'api-server',
    env: process.env.NODE_ENV ?? 'development',
  }),
);

app.post('/chat', async (req, res) => {
  // Session context available; trackUserMessage/trackAiMessage inherit sessionId, traceId
});

Bulk Conversation Import

Use trackConversation() to import an entire conversation history in one call. Each message in the array is tracked as either a [Agent] User Message or [Agent] AI Response event, with turn IDs auto-incremented:

import { trackConversation } from '@amplitude/ai';
import * as amplitude from '@amplitude/analytics-node';

trackConversation({
  amplitude,
  userId: 'user-123',
  sessionId: 'sess-abc',
  agentId: 'support-bot',
  messages: [
    { role: 'user', content: 'How do I reset my password?' },
    {
      role: 'assistant',
      content: 'Go to Settings > Security > Reset Password.',
      model: 'gpt-4o',
      provider: 'openai',
      latency_ms: 1200,
      input_tokens: 15,
      output_tokens: 42,
      total_cost_usd: 0.002,
    },
    { role: 'user', content: 'Thanks, that worked!' },
    {
      role: 'assistant',
      content: 'Glad I could help!',
      model: 'gpt-4o',
      provider: 'openai',
      latency_ms: 800,
      input_tokens: 10,
      output_tokens: 8,
    },
  ],
});

This is useful for backfilling historical conversations or importing data from external systems. The function accepts all the same context fields (agentId, env, customerOrgId, etc.) as the individual tracking methods.

Event Schema

Event Type	Source	Description
`[Agent] User Message`	SDK	User sent a message
`[Agent] AI Response`	SDK	AI model returned a response
`[Agent] Tool Call`	SDK	Tool/function was invoked
`[Agent] Embedding`	SDK	Embedding was generated
`[Agent] Span`	SDK	Span (e.g. RAG step, transform)
`[Agent] Session End`	SDK	Session ended
`[Agent] Session Enrichment`	SDK	Session-level enrichment data
`[Agent] Score`	Both	Evaluation score (quality, sentiment, etc.)
`[Agent] Session Evaluation`	Server	Session-level summary: outcome, turn count, flags, cost. Emitted automatically.
`[Agent] Topic Classification`	Server	One event per topic model per session. Emitted automatically.

Event Property Reference

All event properties are prefixed with [Agent] (except [Amplitude] Session Replay ID). This reference is auto-generated and matches what gets registered in Amplitude's data catalog via the amplitude-ai-register-catalog CLI.

Common Properties (present on all SDK events)

Property	Type	Required	Description
`[Agent] Session ID`	string	Yes	Unique session identifier. All events in one conversation share the same session ID.
`[Agent] Trace ID`	string	No	Identifies one user-message-to-AI-response cycle within a session.
`[Agent] Turn ID`	number	No	Monotonically increasing counter for event ordering within a session.
`[Agent] Agent ID`	string	No	Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston').
`[Agent] Parent Agent ID`	string	No	For multi-agent orchestration: the agent that delegated to this agent.
`[Agent] Customer Org ID`	string	No	Organization ID for multi-tenant platforms. Enables account-level group analytics.
`[Agent] Agent Version`	string	No	Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison.
`[Agent] Agent Description`	string	No	Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams.
`[Agent] Context`	string	No	Serialized JSON dict of arbitrary segmentation dimensions (experiment_variant, surface, feature_flag, prompt_revision, etc.).
`[Agent] Env`	string	No	Deployment environment: 'production', 'staging', or 'dev'.
`[Agent] SDK Version`	string	Yes	Version of the amplitude-ai SDK that produced this event.
`[Agent] Runtime`	string	Yes	SDK runtime: 'python' or 'node'.

User Message Properties

Event-specific properties for [Agent] User Message (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Message ID`	string	Yes	Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages.
`[Agent] Component Type`	string	Yes	Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'.
`[Agent] Locale`	string	No	User locale (e.g., 'en-US').
`[Amplitude] Session Replay ID`	string	No	Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay.
`[Agent] Is Regeneration`	boolean	No	Whether the user requested the AI regenerate a previous response.
`[Agent] Is Edit`	boolean	No	Whether the user edited a previous message and resubmitted.
`[Agent] Edited Message ID`	string	No	The message_id of the original message that was edited (links the edit to the original).
`[Agent] Has Attachments`	boolean	No	Whether this message includes file attachments (uploads, images, etc.).
`[Agent] Attachment Types`	string[]	No	Distinct attachment types (e.g., 'pdf', 'image', 'csv'). Serialized JSON array.
`[Agent] Attachment Count`	number	No	Number of file attachments included with this message.
`[Agent] Total Attachment Size Bytes`	number	No	Total size of all attachments in bytes.
`[Agent] Attachments`	string	No	Serialized JSON array of attachment metadata (type, name, size_bytes, mime_type). Only metadata, never file content.
`[Agent] Message Labels`	string	No	Serialized JSON array of MessageLabel objects (key-value pairs with optional confidence). Used for routing tags, classifier output, business context.
`[Agent] Message Source`	string	No	Origin of the user message: 'user' for real end-user input, 'agent' for inter-agent delegation (parent agent sending instructions to a child agent). Automatically set by provider wrappers based on parent_agent_id context.

AI Response Properties

Event-specific properties for [Agent] AI Response (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Message ID`	string	Yes	Unique identifier for this message event (UUID). Used to link scores and tool calls back to specific messages.
`[Agent] Component Type`	string	Yes	Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'.
`[Agent] Model Name`	string	Yes	LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514').
`[Agent] Provider`	string	Yes	LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock').
`[Agent] Latency Ms`	number	Yes	Total wall-clock latency in milliseconds for this operation.
`[Agent] Is Error`	boolean	Yes	Whether this event represents an error condition.
`[Agent] Error Message`	string	No	Error message text when Is Error is true.
`[Agent] Locale`	string	No	User locale (e.g., 'en-US').
`[Agent] Span Kind`	string	No	Classification of the span type for OTEL bridge compatibility.
`[Amplitude] Session Replay ID`	string	No	Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay.
`[Agent] TTFB Ms`	number	No	Time to first byte/token in milliseconds. Measures perceived responsiveness for streaming.
`[Agent] Input Tokens`	number	No	Number of input/prompt tokens consumed by this LLM call.
`[Agent] Output Tokens`	number	No	Number of output/completion tokens generated by this LLM call.
`[Agent] Total Tokens`	number	No	Total tokens consumed (input + output).
`[Agent] Reasoning Tokens`	number	No	Tokens consumed by reasoning/thinking (o1, o3, extended thinking models).
`[Agent] Cache Read Tokens`	number	No	Input tokens served from the provider's prompt cache (cheaper rate). Used for cache-aware cost calculation.
`[Agent] Cache Creation Tokens`	number	No	Input tokens that created new prompt cache entries.
`[Agent] Cost USD`	number	No	Estimated cost in USD for this LLM call. Cache-aware when cache token counts are provided.
`[Agent] Finish Reason`	string	No	Why the model stopped generating: 'stop', 'end_turn', 'tool_use', 'length', 'content_filter', etc.
`[Agent] Tool Calls`	string	No	Serialized JSON array of tool call requests made by the AI in this response.
`[Agent] Has Reasoning`	boolean	No	Whether the AI response included reasoning/thinking content.
`[Agent] Reasoning Content`	string	No	The AI's reasoning/thinking content (when available and content_mode permits).
`[Agent] System Prompt`	string	No	The system prompt used for this LLM call (when content_mode permits). Chunked for long prompts.
`[Agent] System Prompt Length`	number	No	Character length of the system prompt.
`[Agent] Tool Definitions`	string	No	Normalized JSON array of tool definitions sent to the LLM (when content_mode permits). Each entry contains name, description, and parameters schema.
`[Agent] Tool Definitions Count`	number	No	Number of tool definitions in the LLM request.
`[Agent] Tool Definitions Hash`	string	No	Stable SHA-256 hash of the normalized tool definitions. Always present regardless of content_mode; enables toolset change detection without exposing schemas.
`[Agent] Temperature`	number	No	Temperature parameter used for this LLM call.
`[Agent] Max Output Tokens`	number	No	Maximum output tokens configured for this LLM call.
`[Agent] Top P`	number	No	Top-p (nucleus sampling) parameter used for this LLM call.
`[Agent] Is Streaming`	boolean	No	Whether this response was generated via streaming.
`[Agent] Prompt ID`	string	No	Identifier for the prompt template or version used.
`[Agent] Was Copied`	boolean	No	Whether the user copied this AI response content. An implicit positive quality signal.
`[Agent] Was Cached`	boolean	No	Whether this response was served from a semantic/full-response cache (distinct from token-level prompt caching).
`[Agent] Model Tier`	string	No	Model tier classification: 'fast' (GPT-4o-mini, Haiku, Flash), 'standard' (GPT-4o, Sonnet, Pro), or 'reasoning' (o1, o3, DeepSeek-R1). Auto-inferred from model name.
`[Agent] Has Attachments`	boolean	No	Whether this AI response includes generated attachments (images, charts, files).
`[Agent] Attachment Types`	string[]	No	Distinct attachment types in this AI response. Serialized JSON array.
`[Agent] Attachment Count`	number	No	Number of attachments generated by the AI in this response.
`[Agent] Total Attachment Size Bytes`	number	No	Total size of all AI-generated attachments in bytes.
`[Agent] Attachments`	string	No	Serialized JSON array of AI-generated attachment metadata.
`[Agent] Message Labels`	string	No	Serialized JSON array of MessageLabel objects attached to this AI response.
`[Agent] Message Label Map`	string	No	Serialized JSON map of label key to value for quick lookup.

Tool Call Properties

Event-specific properties for [Agent] Tool Call (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Component Type`	string	Yes	Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'.
`[Agent] Latency Ms`	number	Yes	Total wall-clock latency in milliseconds for this operation.
`[Agent] Is Error`	boolean	Yes	Whether this event represents an error condition.
`[Agent] Error Message`	string	No	Error message text when Is Error is true.
`[Agent] Locale`	string	No	User locale (e.g., 'en-US').
`[Agent] Span Kind`	string	No	Classification of the span type for OTEL bridge compatibility.
`[Amplitude] Session Replay ID`	string	No	Links to Amplitude Session Replay (format: device_id/session_id). Enables one-click navigation from AI session to browser replay.
`[Agent] Invocation ID`	string	Yes	Unique identifier for this tool invocation (UUID). Used to link tool calls to parent messages.
`[Agent] Tool Name`	string	Yes	Name of the tool/function that was invoked (e.g., 'search_docs', 'web_search').
`[Agent] Tool Success`	boolean	Yes	Whether the tool call completed successfully.
`[Agent] Tool Input`	string	No	Serialized JSON of the tool's input arguments. Only sent when content_mode='full'.
`[Agent] Tool Output`	string	No	Serialized JSON of the tool's output/return value. Only sent when content_mode='full'.
`[Agent] Parent Message ID`	string	No	The message_id of the user message that triggered this tool call. Links the tool call into the event graph.

Embedding Properties

Event-specific properties for [Agent] Embedding (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Component Type`	string	Yes	Type of component that produced this event: 'user_input', 'llm', 'tool', 'embedding'.
`[Agent] Model Name`	string	Yes	LLM model identifier (e.g., 'gpt-4o', 'claude-sonnet-4-20250514').
`[Agent] Provider`	string	Yes	LLM provider name (e.g., 'openai', 'anthropic', 'google', 'mistral', 'bedrock').
`[Agent] Latency Ms`	number	Yes	Total wall-clock latency in milliseconds for this operation.
`[Agent] Span ID`	string	Yes	Unique identifier for this embedding operation (UUID).
`[Agent] Input Tokens`	number	No	Number of input tokens processed by the embedding model.
`[Agent] Embedding Dimensions`	number	No	Dimensionality of the output embedding vector.
`[Agent] Cost USD`	number	No	Estimated cost in USD for this embedding operation.

Span Properties

Event-specific properties for [Agent] Span (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Latency Ms`	number	Yes	Total wall-clock latency in milliseconds for this operation.
`[Agent] Is Error`	boolean	Yes	Whether this event represents an error condition.
`[Agent] Error Message`	string	No	Error message text when Is Error is true.
`[Agent] Span ID`	string	Yes	Unique identifier for this span (UUID).
`[Agent] Span Name`	string	Yes	Name of the operation (e.g., 'rag_pipeline', 'vector_search', 'rerank').
`[Agent] Parent Span ID`	string	No	Span ID of the parent span for nested pipeline steps.
`[Agent] Input State`	string	No	Serialized JSON of the span's input state. Only sent when content_mode='full'.
`[Agent] Output State`	string	No	Serialized JSON of the span's output state. Only sent when content_mode='full'.

Session End Properties

Event-specific properties for [Agent] Session End (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Enrichments`	string	No	Serialized JSON of SessionEnrichments (topic classifications, rubric scores, outcome, flags). Attached when enrichments are provided at session close.
`[Agent] Abandonment Turn`	number	No	Turn ID of the last user message that received an AI response before the user left. Low values (e.g., 1) strongly signal first-response dissatisfaction.
`[Agent] Session Idle Timeout Minutes`	number	No	Custom idle timeout for this session (default 30 min). Tells the server how long to wait before auto-closing.

Session Enrichment Properties

Event-specific properties for [Agent] Session Enrichment (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Enrichments`	string	Yes	Serialized JSON of SessionEnrichments: topic_classifications, rubrics, overall_outcome, quality_score, sentiment_score, boolean flags, agent chain metadata, and message labels.

Score Properties

Event-specific properties for [Agent] Score (in addition to common properties above).

Property	Type	Required	Description
`[Agent] Score Name`	string	Yes	Name of the score (e.g., 'user-feedback', 'task_completion', 'accuracy', 'groundedness').
`[Agent] Score Value`	number	Yes	Numeric score value. Binary (0/1), continuous (0.0-1.0), or rating scale (1-5).
`[Agent] Target ID`	string	Yes	The message_id or session_id being scored.
`[Agent] Target Type`	string	Yes	What is being scored: 'message' or 'session'.
`[Agent] Evaluation Source`	string	Yes	Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert).
`[Agent] Comment`	string	No	Optional text explanation for the score (respects content_mode).
`[Agent] Taxonomy Version`	string	No	Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id).
`[Agent] Evaluated At`	number	No	Epoch milliseconds when this enrichment/evaluation was computed.
`[Agent] Score Label`	string	No	Direction-neutral magnitude label derived from score value. Default 5-tier: very_high (>=0.8), high (>=0.6), moderate (>=0.4), low (>=0.2), very_low (>=0.0). Server-side only.

Server-Side: Session Evaluation Properties

[Agent] Session Evaluation is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.

Property	Type	Required	Description
`[Agent] Session ID`	string	Yes	Unique session identifier. All events in one conversation share the same session ID.
`[Agent] Agent ID`	string	Yes	Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston').
`[Agent] Customer Org ID`	string	Yes	Organization ID for multi-tenant platforms. Enables account-level group analytics.
`[Agent] Evaluation Source`	string	Yes	Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert).
`[Agent] Taxonomy Version`	string	Yes	Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id).
`[Agent] Evaluated At`	number	Yes	Epoch milliseconds when this enrichment/evaluation was computed.
`[Agent] Overall Outcome`	string	Yes	Session outcome classification: 'success', 'partial_success', 'failure', 'abandoned', 'response_provided', etc.
`[Agent] Turn Count`	number	Yes	Number of conversation turns in this session.
`[Agent] Session Total Tokens`	number	No	Total LLM tokens consumed across all turns in this session.
`[Agent] Session Avg Latency Ms`	number	No	Average AI response latency in milliseconds across the session.
`[Agent] Request Complexity`	string	No	Complexity classification of the user's request: 'simple', 'moderate', 'complex', or 'ambiguous'.
`[Agent] Has Task Failure`	boolean	Yes	Whether the agent failed to complete the user's request.
`[Agent] Has Negative Feedback`	boolean	Yes	Whether the user expressed dissatisfaction during the session.
`[Agent] Has Technical Failure`	boolean	Yes	Whether technical errors occurred (tool timeouts, API failures, etc.).
`[Agent] Has Data Quality Issues`	boolean	Yes	Whether the AI output had data quality problems (wrong data, hallucinations, etc.).
`[Agent] Models Used`	string[]	No	LLM models used in this session. JSON array of strings.
`[Agent] Root Agent Name`	string	No	Entry-point agent in multi-agent flows.
`[Agent] Agent Chain Depth`	number	No	Number of agents in the delegation chain.
`[Agent] Task Failure Type`	string	No	Specific failure type when has_task_failure is true (e.g., 'wrong_answer', 'unable_to_complete').
`[Agent] Technical Error Count`	number	No	Count of technical errors that occurred during the session.
`[Agent] Error Categories`	string[]	No	Categorized error types (e.g., 'chart_not_found', 'timeout'). JSON array of strings.
`[Agent] Behavioral Patterns`	string[]	No	Detected behavioral anti-patterns (e.g., 'retry_storm', 'clarification_loop', 'early_abandonment'). JSON array of strings.
`[Agent] Session Cost USD`	number	No	Total LLM cost in USD for this AI session (aggregated from per-message costs).
`[Agent] Enrichment Cost USD`	number	No	Cost in USD of running the enrichment pipeline's LLM inference for this session. Distinct from the session's own LLM cost.
`[Agent] Quality Score`	number	No	Overall quality score (0.0-1.0) computed by the enrichment pipeline for this session.
`[Agent] Sentiment Score`	number	No	User sentiment score (0.0-1.0) inferred from the conversation by the enrichment pipeline.
`[Agent] Task Failure Reason`	string	No	Explanation of why the task failed when has_task_failure is true (e.g., 'chart data source unavailable').
`[Agent] Agent Chain`	string[]	No	Serialized JSON array of agent IDs representing the delegation chain in multi-agent flows.
`[Agent] Project ID`	string	No	Amplitude project ID that owns the AI session being evaluated.
`[Agent] Has User Feedback`	boolean	Yes	Whether the session received explicit user feedback (thumbs up/down, rating).
`[Agent] User Score`	number	No	Aggregate user feedback score for the session (0.0-1.0). Present only when has_user_feedback is true.
`[Agent] Agent Version`	string	No	Agent code version (e.g., 'v4.2'). Enables version-over-version quality comparison.
`[Agent] Agent Description`	string	No	Human-readable description of the agent's purpose (e.g., 'Handles user chat requests via OpenAI GPT-4o'). Enables observability-driven agent registry from event streams.

Server-Side: Topic Classification Properties

[Agent] Topic Classification is emitted automatically by the server-side enrichment pipeline — do not send this event from your code.

Property	Type	Required	Description
`[Agent] Session ID`	string	Yes	Unique session identifier. All events in one conversation share the same session ID.
`[Agent] Agent ID`	string	Yes	Identifies which AI agent handled the interaction (e.g., 'support-bot', 'houston').
`[Agent] Customer Org ID`	string	Yes	Organization ID for multi-tenant platforms. Enables account-level group analytics.
`[Agent] Evaluation Source`	string	Yes	Source of the evaluation: 'user' (end-user feedback), 'ai' (automated/server pipeline), or 'reviewer' (human expert).
`[Agent] Taxonomy Version`	string	Yes	Which taxonomy config version produced this enrichment (from ai_category_config.config_version_id).
`[Agent] Evaluated At`	number	Yes	Epoch milliseconds when this enrichment/evaluation was computed.
`[Agent] Topic`	string	Yes	Which topic model this classification is for (e.g., 'product_area', 'query_intent', 'error_domain').
`[Agent] Selection Mode`	string	Yes	Whether this topic model uses 'single' (MECE) or 'multiple' (multi-label) selection.
`[Agent] Primary`	string	No	Primary classification value (e.g., 'charts', 'billing_issues').
`[Agent] Secondary`	string[]	No	Secondary classifications for multi-label topics. JSON array of strings.
`[Agent] Subcategories`	string[]	No	Subcategories for finer classification within the primary topic (e.g., 'TREND_ANALYSIS', 'WRONG_EVENT'). JSON array of strings.

Event JSON Examples

[Agent] AI Response

A realistic example of what gets sent to Amplitude for an AI response:

{
  "event_type": "[Agent] AI Response",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Session ID": "sess-abc123",
    "[Agent] Trace ID": "trace-def456",
    "[Agent] Turn ID": 2,
    "[Agent] Message ID": "msg-789xyz",
    "[Agent] Model Name": "gpt-4o",
    "[Agent] Provider": "openai",
    "[Agent] Model Tier": "standard",
    "[Agent] Latency Ms": 1203,
    "[Agent] Input Tokens": 150,
    "[Agent] Output Tokens": 847,
    "[Agent] Total Tokens": 997,
    "[Agent] Cost USD": 0.0042,
    "[Agent] Is Error": false,
    "[Agent] Finish Reason": "stop",
    "[Agent] Is Streaming": false,
    "[Agent] Component Type": "llm",
    "[Agent] Agent ID": "support-bot",
    "[Agent] Env": "production",
    "[Agent] SDK Version": "0.1.0",
    "[Agent] Runtime": "node"
  }
}

[Agent] User Message

{
  "event_type": "[Agent] User Message",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Session ID": "sess-abc123",
    "[Agent] Turn ID": 1,
    "[Agent] Message ID": "msg-123abc",
    "[Agent] Component Type": "user_input",
    "[Agent] Agent ID": "support-bot",
    "[Agent] Env": "production",
    "[Agent] SDK Version": "0.1.0",
    "[Agent] Runtime": "node",
    "$llm_message": {
      "text": "How do I reset my password?"
    }
  }
}

[Agent] Tool Call

{
  "event_type": "[Agent] Tool Call",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Session ID": "sess-abc123",
    "[Agent] Turn ID": 3,
    "[Agent] Invocation ID": "inv-456def",
    "[Agent] Tool Name": "search_knowledge_base",
    "[Agent] Tool Success": true,
    "[Agent] Is Error": false,
    "[Agent] Latency Ms": 340,
    "[Agent] Component Type": "tool",
    "[Agent] Agent ID": "support-bot",
    "[Agent] Tool Input": "{\"query\":\"password reset instructions\"}",
    "[Agent] Tool Output": "{\"results\":[{\"title\":\"Password Reset Guide\"}]}",
    "[Agent] SDK Version": "0.1.0",
    "[Agent] Runtime": "node"
  }
}

[Agent] Score

{
  "event_type": "[Agent] Score",
  "user_id": "user-42",
  "event_properties": {
    "[Agent] Score Name": "thumbs-up",
    "[Agent] Score Value": 1,
    "[Agent] Target ID": "msg-789xyz",
    "[Agent] Target Type": "message",
    "[Agent] Evaluation Source": "user",
    "[Agent] Session ID": "sess-abc123",
    "[Agent] Agent ID": "support-bot",
    "[Agent] SDK Version": "0.1.0",
    "[Agent] Runtime": "node"
  }
}

Sending Events Without the SDK

The [Agent] event schema is not tied to this SDK. If your stack doesn't have an Amplitude AI SDK, you can send the same events directly via Amplitude's ingestion APIs.

What the SDK handles for you

When you use this SDK, the following are managed automatically. If you send events directly, you are responsible for these:

Concern	SDK behavior	DIY equivalent
Session ID	Generated once per `session()` and propagated to every event	Generate a UUID per conversation and include it as `[Agent] Session ID` on every event
Deduplication	Automatic `insert_id` on each event	Set a unique `insert_id` per event to prevent duplicates on retry
Property prefixing	All properties are prefixed with `[Agent]`	You must include the `[Agent]` prefix in every property name
Cost / token calculation	Auto-computed from model and token counts	Compute and send `[Agent] Cost USD`, `[Agent] Input Tokens`, etc. yourself
Server-side enrichment	`[Agent] Session Evaluation`, `[Agent] Topic Classification`, and `[Agent] Score` events are emitted automatically by the enrichment pipeline after `[Agent] Session End`	These fire automatically — you do not need to send them. Just send the SDK-level events and close the session with `[Agent] Session End`.

Ingestion methods

Method	Best for	Docs
HTTP V2 API	Real-time, low-to-medium volume	HTTP V2 API docs
Batch Event Upload API	High volume, backfills	Batch API docs
Amazon S3 Import	Bulk historical import, warehouse-first workflows	S3 Import docs

Minimal HTTP API example

curl -X POST https://api2.amplitude.com/2/httpapi \
  -H 'Content-Type: application/json' \
  -d '{
    "api_key": "YOUR_API_KEY",
    "events": [
      {
        "event_type": "[Agent] User Message",
        "user_id": "user-42",
        "insert_id": "evt-unique-id-1",
        "event_properties": {
          "[Agent] Session ID": "sess-abc123",
          "[Agent] Trace ID": "trace-def456",
          "[Agent] Turn ID": 1,
          "[Agent] Agent ID": "support-bot",
          "[Agent] Message ID": "msg-001"
        }
      },
      {
        "event_type": "[Agent] AI Response",
        "user_id": "user-42",
        "insert_id": "evt-unique-id-2",
        "event_properties": {
          "[Agent] Session ID": "sess-abc123",
          "[Agent] Trace ID": "trace-def456",
          "[Agent] Turn ID": 1,
          "[Agent] Message ID": "msg-002",
          "[Agent] Agent ID": "support-bot",
          "[Agent] Model Name": "gpt-4o",
          "[Agent] Provider": "openai",
          "[Agent] Latency Ms": 1203,
          "[Agent] Input Tokens": 150,
          "[Agent] Output Tokens": 420,
          "[Agent] Cost USD": 0.0042
        }
      }
    ]
  }'

Refer to the Event Schema and Event Property Reference tables above for required and optional properties per event type.

Register Event Schema in Your Data Catalog

Amplitude's Data Catalog documents events and properties with descriptions, types, and required flags. The @amplitude/ai package includes a tool to generate all the Taxonomy API calls for you.

Prerequisites

Amplitude Enterprise plan (Taxonomy API access)
Project API key and Secret key from Settings > Projects in your Amplitude org

Option A: Generate curl commands (JS-native, no dependencies)

The bundled CLI reads data/agent_event_catalog.json and prints executable curl commands — it makes no network requests itself.

# Preview the curl commands (uses placeholder keys)
npx amplitude-ai-register-catalog

# Generate with your real keys
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET

# Pipe to bash to execute immediately
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET | bash

# EU data residency
npx amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET --eu | bash

Option B: Python CLI (direct execution)

If you have Python available, the amplitude-ai package provides a CLI that calls the Taxonomy API directly with retry logic and a progress summary:

pip install amplitude-ai
amplitude-ai-register-catalog --api-key YOUR_KEY --secret-key YOUR_SECRET

What gets registered

All 10 [Agent] event types and their properties (see Event Property Reference above), organized under the "Agent Analytics" category. The commands are idempotent — safe to re-run. They create missing events/properties and update existing ones.

Utilities and Type Exports

Token and cost utilities

calculateCost() — Returns cost in USD when @pydantic/genai-prices is installed; otherwise returns 0 (never null).
countTokens(text, model?) — Uses tiktoken when available. For unknown models, tries o200k_base encoding before falling back to cl100k_base (matching the Python SDK).
estimateTokens(text) — Heuristic fallback: ceil(chars/3.5 + words*0.1) (matching the Python SDK).
stripProviderPrefix(modelName) — Splits on : (e.g., openai:gpt-4o → gpt-4o). Use for normalizing model IDs before cost lookup. Import from @amplitude/ai/internals.

Shared types

The package exports structural interfaces for provider shapes from @amplitude/ai and @amplitude/ai/types: ChatCompletionParams, ChatCompletionResponse, AnthropicParams, AnthropicResponse, BedrockConverseParams, BedrockConverseResponse, MistralChatParams, MistralChatResponse, TrackFn, TrackCallOptions, and related types. Use these for typing provider integrations without depending on the underlying SDK types.

Constants

All PROP_* and EVENT_* constants are exported for advanced use:

import {
  EVENT_AI_RESPONSE,
  EVENT_EMBEDDING,
  EVENT_SCORE,
  EVENT_SESSION_END,
  EVENT_SESSION_ENRICHMENT,
  EVENT_SPAN,
  EVENT_TOOL_CALL,
  EVENT_USER_MESSAGE,
  PROP_MODEL_NAME,
  PROP_SESSION_ID,
  PROP_TRACE_ID,
  // ... etc
} from '@amplitude/ai';

See src/core/tracking.ts and src/core/constants.ts for the full list.

API Reference

For AI Coding Agents

This SDK is designed to be discovered and used by any AI coding agent — Cursor, Claude Code, Windsurf, Copilot, Codex, Cline, or any agent that can read files.

The fastest path:

npm install @amplitude/ai
npx amplitude-ai

The CLI prints a prompt to paste into your agent:

Instrument this app with @amplitude/ai. Follow node_modules/@amplitude/ai/amplitude-ai.md

The agent reads the guide, scans your project, and instruments everything in 4 phases: Detect, Discover, Instrument, Verify.

Files shipped with the package:

File	Purpose
`amplitude-ai.md`	Primary guide — self-contained 4-phase instrumentation workflow and full API reference
`AGENTS.md`	Concise index with canonical patterns, MCP surface, gotchas, and CLI reference
`llms.txt`	Compact discovery file listing tools, resources, and event names
`llms-full.txt`	Extended reference with full API signatures, provider coverage matrix, and common error resolutions
`mcp.schema.json`	Structured JSON describing the MCP server's tools, resources, and prompt

Optional: MCP server for advanced tooling. Run amplitude-ai mcp to start the MCP server (standard stdio protocol). MCP-compatible agents can call tools like scan_project, instrument_file, validate_file, and generate_verify_test for deeper analysis. The MCP server is not required for the core instrumentation workflow — amplitude-ai.md is self-contained.

For Python SDK Migrators

If you're moving from amplitude_ai (Python) to @amplitude/ai (TypeScript/Node), the core event model is the same, but ergonomics differ to match the runtime:

Area	Python (`amplitude_ai`)	TypeScript (`@amplitude/ai`)
Session scope	`with session as s:`	`await session.run(async (s) => { ... })`
Tool/observe wrappers	`@tool`, `@observe` decorators	`tool()`, `observe()` HOFs
Context propagation	`contextvars`	`AsyncLocalStorage`
Tool input schema	Optional auto-schema from Python type hints	Explicit `inputSchema` object (recommended: define with Zod, pass JSON Schema)
Sync behavior	Native sync + async wrappers	Wrappers return async (`Promise<T>`)
Middleware	Starlette/FastAPI middleware	Express-compatible middleware
Bootstrap/preload	`sitecustomize.py` + `PYTHONPATH` patterns	`NODE_OPTIONS=--import` preload patterns
Provider patching model	Python class replacement	Prototype patching + Proxy fallback for lazy getters

Features that do not map 1:1 because of platform/runtime constraints:

Auto-generated tool schemas from runtime type introspection
Python-style per-call keyword overrides (for example amplitude_user_id=...)
Interrupting synchronous tool execution with Python threading primitives
CrewAI integration (Python-only; TS package throws a clear error)

Python → TypeScript cheat sheet

# Python
from amplitude_ai import AmplitudeAI, tool, observe

ai = AmplitudeAI(api_key="xxx")
agent = ai.agent("my-agent", user_id="u1")

with agent.session(user_id="u1") as s:
    s.track_user_message("Hello")
    s.track_ai_message("Hi!", model="gpt-4", provider="openai", latency_ms=100)

@tool(name="search")
def search(query: str) -> str:
    return db.search(query)

// TypeScript
import { AmplitudeAI, tool } from '@amplitude/ai';

const ai = new AmplitudeAI({ apiKey: 'xxx' });
const agent = ai.agent('my-agent', { userId: 'u1' });

const session = agent.session({ userId: 'u1' });
await session.run(async (s) => {
  s.trackUserMessage('Hello');
  s.trackAiMessage('Hi!', 'gpt-4', 'openai', 100);
});

const search = tool(async (args: { query: string }) => db.search(args.query), {
  name: 'search',
});

Need Help?

Bug reports and feature requests: Open an issue
General questions: Amplitude Support
Python SDK: Looking for the Python version? See amplitude-ai on PyPI

Contributing

Contributions are welcome! Please open an issue first to discuss what you'd like to change, then submit a pull request.

Fork the repository
Create your branch (git checkout -b my-feature)
Install dependencies (pnpm install)
Make your changes and add tests
Ensure all tests pass (pnpm run test:coverage) and TypeScript compiles (pnpm run test:typescript)
Submit a pull request

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.claude/commands		.claude/commands
.context		.context
.cursor		.cursor
.github		.github
bin		bin
data		data
examples		examples
scripts		scripts
src		src
test		test
.eslintrc.cjs		.eslintrc.cjs
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
amplitude-ai.md		amplitude-ai.md
biome.json		biome.json
llms-full.txt		llms-full.txt
llms.txt		llms.txt
mcp.schema.json		mcp.schema.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.build.json		tsconfig.build.json
tsconfig.json		tsconfig.json
tsdown.config.ts		tsdown.config.ts
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

@amplitude/ai

How to Get Started

Instrument with a coding agent (recommended)

Manual setup

Table of Contents

Installation

Quick Start

5-minute quick start

Current Limitations

Is this for me?

Why this SDK?

What you can build

How quality measurement works

What You Set vs What You Get

What You Get at Each Level

Support matrix

Parity and runtime limitations

Zero-code (for verification or legacy codebases)

Wrap (recommended for production)

Full control

Core Concepts

AmplitudeAI

BoundAgent

TenantHandle

User Identity

Session

tool()

observe()

Configuration

Context Dict Conventions

Privacy & Content Control

FULL mode (default)

METADATA_ONLY mode

CUSTOMER_ENRICHED mode

PrivacyConfig (advanced)

When to use which mode

Cache-Aware Cost Tracking

Semantic Cache Tracking

Model Tier Classification

Provider Wrappers

Streaming Tracking

Automatic streaming (provider wrappers)

Manual streaming

StreamingAccumulator

Attachment Tracking

Implicit Feedback

tool() and observe() HOFs

tool()

observe()

Scoring Patterns

User Feedback (thumbs up/down)

Numeric Rating

LLM-as-Judge

Session-Level Scoring

Score Properties

Enrichments

Session Enrichments

Track Enrichments Separately

End-to-End Example: customer_enriched Mode

Available Enrichment Fields

Message Labels

Debug and Dry-Run Modes

Debug Mode

Dry-Run Mode

Environment Variable Configuration

Patching

Declaring expected providers (optional)

Auto-Instrumentation CLI

Doctor CLI

Status

Shell Completions

MCP Server

Examples and AI Coding Agent Guide

Integrations

LangChain

OpenTelemetry

End-to-End Example: `customer_enriched` Mode