Summary
Make Session Replay data in sentry-cli highly actionable for coding agents by adding first-class replay segment fetching, local caching, normalized event extraction, and inspection commands.
The CLI should not try to be the agent or own an ask command as the primary interface. Instead, it should become the replay data plane: a reliable way for agents to fetch replay segments, cache them, inspect DOM/rrweb/custom events, search timelines, and pull evidence windows around user actions. Agents can then compose those tools, plus a bundled skill that explains the replay data model, to answer questions such as:
- "When the user clicked X, what happened next?"
- "Where did the user spend the most time?"
- "What hangups caused the user to struggle?"
- "Were there failed requests, console errors, dead clicks, rage clicks, or DOM changes around this action?"
Current State
The CLI currently has basic replay support:
sentry replay list queries replay metadata.
sentry replay view fetches replay detail, related issues/traces, and a very small activity preview from recording segments.
sentry explore --dataset replays exposes replay index fields.
Relevant CLI files:
src/commands/replay/list.ts
src/commands/replay/view.ts
src/lib/api/replays.ts
src/lib/formatters/replay.ts
src/lib/replay-search.ts
src/types/replay.ts
The main limitation is that replay segments are treated as display garnish. replay view currently extracts only a handful of activity events from raw segments, capped to a tiny preview. There is no replay-specific local cache, no normalized event stream, no segment index, and no way for agents to inspect all DOM/rrweb/custom events.
There is also a likely correctness gap: the Sentry recording-segments endpoint is paginated, while the CLI currently downloads it as a single request. The frontend fetches segment pages with per_page=100 until count_segments is exhausted. The CLI should mirror that behavior so long replays are fully available.
Relevant Sentry/rrweb Model
Sentry replay data has several useful layers:
- Replay metadata from org/project replay endpoints: duration, urls, counts, errors, traces, clicks, user/browser/os/sdk/device, tags, etc.
- Recording segments from the project-scoped
recording-segments endpoint. These are compressed/packed storage blobs returned as rrweb/custom JSON when downloaded.
- rrweb events: full snapshots, incremental DOM mutations, mouse interactions, inputs, scrolls, viewport changes, media interactions, console logs, and more.
- Sentry custom replay frames: breadcrumbs, performance spans, options, video, web vitals, network breadcrumbs, console breadcrumbs, mobile events, etc.
- Related data: error events, feedback, trace ids, logs, click selector endpoints, and existing Seer replay summary APIs.
Useful Sentry references:
getsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_index.py
getsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_details.py
getsentry/sentry/src/sentry/replays/usecases/reader.py
getsentry/sentry/src/sentry/replays/usecases/pack.py
getsentry/sentry/src/sentry/replays/post_process.py
getsentry/sentry/src/sentry/replays/usecases/ingest/event_parser.py
getsentry/sentry/static/app/utils/replays/hooks/useReplayData.tsx
getsentry/sentry/static/app/utils/replays/hydrateFrames.tsx
getsentry/sentry/static/app/utils/replays/replayReader.tsx
@sentry-internal/rrweb-types, especially EventType, IncrementalSource, MouseInteractions, mutation/input/scroll/viewport payloads.
Proposal
Add a replay evidence system to the CLI with three parts:
- A local replay bundle cache.
- A normalized replay event model.
- Agent-friendly inspection commands and generated/bundled skill docs.
1. Replay Bundle Cache
Add a replay-specific cache under something like:
~/.sentry/cache/replays/{identity}/{org}/{project}/{replayId}/
Suggested contents:
metadata.json
segments/{segmentId}.json.gz
index/events.jsonl
index/navigation.json
index/interactions.json
index/network.json
index/problems.json
index/dom-summary.json
The raw segment payloads should live on disk, not in SQLite. SQLite can track manifests/cache lookup if useful, but segment blobs can be large and should be stored as private files.
Security/privacy requirements:
- Cache directory should be
0700; files should be 0600.
- Provide a way to bypass or clear replay cache.
- Clear replay cache on auth/logout flows if appropriate.
- Treat replay data as sensitive; do not attempt to unmask data that rrweb/Sentry masked.
- Be explicit in outputs when text/DOM data is unavailable because it was masked or not captured.
Caching behavior:
- Finished replays can generally be treated as immutable, subject to retention/privacy constraints.
- Live or recently active replays should be refreshable because segment count may grow.
- Avoid duplicating huge segment payloads in the generic HTTP response cache once replay-specific caching exists.
2. Normalized Replay Event Model
Introduce a normalized event schema that agents can rely on, regardless of whether the source was rrweb, a Sentry custom frame, a breadcrumb, a perf span, or related event data.
Example event:
{
"replayId": "abc123",
"segmentId": 12,
"frameIndex": 184,
"offsetMs": 83421,
"timestamp": "2026-05-03T18:42:11.421Z",
"kind": "click",
"category": "interaction",
"label": "button.checkout",
"url": "/checkout",
"selector": "button[data-test-id=checkout]",
"nodeId": 982,
"rawType": "IncrementalSnapshot",
"rawSource": "MouseInteraction"
}
Initial event kinds should include:
navigation
click
tap
input
focus
blur
scroll
viewport
mutation
dom-snapshot
breadcrumb
network
console
error
span
web-vital
memory
video
mobile
Important implementation detail: centralize timestamp normalization. rrweb event timestamps are milliseconds, while breadcrumb/performance payload fields may use seconds. Sentry has existing frontend/backend logic for this; the CLI should port the relevant normalization and test it with fixtures.
3. Agent-Friendly Commands
sentry replay fetch <replay>
Fetch replay metadata, all recording segment pages, and optionally related errors/traces/logs. Build/update the local replay bundle and indexes.
Useful flags:
--force Refresh cached data
--no-cache Fetch but do not persist
--segments <range> Fetch all or a subset for debugging
--include <list> metadata,segments,errors,traces,logs,clicks
--json Emit manifest and cache paths
sentry replay events <replay>
Primary agent primitive. Emit normalized replay events.
Useful flags:
--kind click,network,console,error,mutation
--from 01:20
--to 01:45
--contains checkout
--selector button.checkout
--url /checkout
--limit 200
--json
--jsonl
--raw Include raw frame payload pointer or payload snippet
This command should make DOM/rrweb activity inspectable without requiring agents to know the raw segment layout.
sentry replay window <replay>
Return an evidence slice around a timestamp or event match.
Examples:
sentry replay window org/project/abc123 --at 01:23 --before 10s --after 30s
sentry replay window org/project/abc123 --contains checkout --before 5s --after 20s
Output should group nearby activity by category: interaction, navigation, network, console, errors, DOM mutations, spans, web vitals.
sentry replay search <replay> <query>
Fuzzy search over normalized event fields:
- selectors
- visible/unmasked text
- URLs
- network URLs
- breadcrumb messages
- console messages
- error titles/messages
- span descriptions
Return matching events with stable pointers: replay id, segment id, frame index, timestamp, offset.
sentry replay dom <replay>
Inspect DOM-related replay data.
Initial version can be event-based rather than full reconstruction:
sentry replay dom org/project/abc123 --at 01:23
sentry replay dom org/project/abc123 --from 01:20 --to 01:30 --kind mutation,input,scroll
Later versions can add best-effort DOM reconstruction from full snapshots + incremental mutations. A browser-backed rrweb player should be optional, not required for the core CLI flow.
sentry replay stats <replay>
Deterministic summary useful for orientation:
- total duration
- route/screen time
- active vs idle time
- most clicked selectors
- slowest network calls
- failed requests
- console error count
- rage/dead click count
- largest DOM mutation bursts
- poor web vital events
sentry replay struggles <replay>
Deterministic friction analysis. Rank likely struggle windows using signals such as:
- dead clicks
- rage clicks
- repeated clicks on same element
- clicks followed by no navigation/network/DOM change
- failed fetch/xhr/resource requests
- slow network requests
- console errors
- hydration errors
- poor LCP/CLS
- large mutation bursts
- long idle periods immediately after interaction
- repeated input/focus without success
- back-and-forth navigation
Each finding should include evidence pointers and a recommended follow-up command.
Example output shape:
{
"finding": "Repeated checkout clicks did not produce navigation",
"severity": "medium",
"window": {"fromOffsetMs": 81200, "toOffsetMs": 94600},
"evidence": [
{"kind": "click", "offsetMs": 83421, "segmentId": 12, "frameIndex": 184},
{"kind": "click", "offsetMs": 87820, "segmentId": 12, "frameIndex": 211},
{"kind": "network", "offsetMs": 88110, "status": 500, "url": "/api/checkout"}
],
"nextCommand": "sentry replay window org/project/abc123 --at 01:23 --before 10s --after 30s"
}
Bundled Skill / Agent Documentation
Add generated or maintained skill docs that teach agents how to use these replay commands.
The skill should explain:
- Start with
sentry replay fetch.
- Use
sentry replay stats for orientation.
- Use
sentry replay struggles to find likely friction.
- Use
sentry replay search to locate user actions.
- Use
sentry replay window around relevant timestamps.
- Use
sentry replay events --kind ... for detailed evidence.
- Use
sentry replay dom for DOM/mutation/input/scroll inspection.
- Cite
segmentId, frameIndex, and offsets in conclusions.
- Treat masked text and absent DOM data as uncertainty.
- Do not invent user-visible text or DOM state when it was not captured.
This keeps the reasoning layer outside the CLI while making the CLI highly usable by agents.
Implementation Plan
Phase 1: Correct segment fetching
- Add paginated segment download support with
per_page=100.
- Use
count_segments from replay detail when available.
- Add tests for multi-page segment responses.
- Preserve current
replay view behavior, but ensure it uses complete segments or explicitly reports partial data.
Phase 2: Add replay cache and fetch command
- Add replay bundle storage helpers.
- Add
sentry replay fetch.
- Store replay metadata and raw segment pages/segments.
- Add cache manifest/versioning.
- Add privacy-safe permissions.
- Decide how replay cache interacts with generic response cache.
Phase 3: Normalize events
- Define typed rrweb/custom event constants and minimal schemas.
- Flatten segment frames into sorted normalized events.
- Port relevant hydration behavior from Sentry frontend where appropriate.
- Add fixtures for rrweb snapshots, mutations, inputs, clicks, network breadcrumbs, console breadcrumbs, perf spans, and web vitals.
Phase 4: Add inspection commands
- Add
replay events.
- Add
replay window.
- Add
replay search.
- Add
replay stats.
- Add
replay struggles.
- Keep JSON/JSONL output stable and evidence-first.
Phase 5: DOM inspection
- Start with DOM event summaries: snapshots, mutations, inputs, scrolls, viewport changes, node ids/selectors where available.
- Later add best-effort DOM reconstruction from rrweb snapshots and mutations.
- Avoid making a browser/rrweb replayer mandatory for core CLI use.
Phase 6: Agent skill/docs
- Add or update Sentry CLI skill documentation for replay investigation workflows.
- Ensure generated command docs include examples for agent workflows.
- Include sample workflows for common questions.
Open Questions
- What should the default replay cache TTL be, especially for sensitive replay payloads?
- Should replay cache be cleared automatically on logout, or should it be scoped only by identity fingerprint?
- Should generic response-cache skip replay segment payloads once replay-specific cache exists?
- How much related data should
replay fetch include by default: errors, traces, logs, click selectors?
- Should
replay events --raw include payload snippets or only stable pointers by default?
- Do we want JSONL as a first-class output mode for very large event streams?
- How much DOM reconstruction should be implemented in the CLI before delegating to browser-backed tooling?
Acceptance Criteria
- Agents can download and cache a full replay locally.
- Long replays with multiple segment pages are handled correctly.
- Agents can list normalized replay events by kind/time/search query.
- Agents can pull a compact evidence window around an action or timestamp.
- Agents can identify likely user struggle windows without needing an LLM.
- Outputs include stable evidence pointers: replay id, segment id, frame index, timestamp/offset.
- Masked or unavailable DOM/text data is represented honestly.
- The CLI remains the data/inspection layer; agent reasoning is documented in bundled skill guidance.
Follow-up: Feedback From a Real Multi-Replay Analysis
We tested the current replay command set against a generalized product analytics question:
How often do users open a route and leave before interacting with it?
In this case the route happened to be Sentry dashboard detail pages, but the workflow is intentionally broader: identify sessions that visited a route pattern, inspect the event timeline for each session, and classify whether any interaction occurred before the next route or replay end.
The current commands made this possible, but the workflow required too much external orchestration:
- Use
replay list to find replay IDs by URL/path.
- Extract IDs with shell tooling.
- Run
replay event list once per replay, hundreds of times.
- Join replay metadata and event timelines externally.
- Classify route windows with custom
jq logic.
- Separately compute session-level and user-level rates.
That is a good sign that the primitives are useful, but a poor experience for agents and humans trying to answer broad replay questions.
Generalized Improvements Suggested by This Exercise
Batch replay event inspection
replay event list should support inspecting multiple replays in one command, using stdin or a file of replay IDs. The CLI should handle bounded concurrency, retries, and partial failures internally.
Example shape:
sentry replay event list sentry/javascript \
--ids-file replays.txt \
--kind navigation,click,tap,input,focus,scroll \
--jsonl
Each emitted row should include replayId, and the command should provide a final summary of successes, failures, and truncation.
Route visit summaries
Add a generic route/session summary primitive that turns replay timelines into route windows. This should not be dashboard-specific; it should work for any route glob/path matcher.
Example shape:
sentry replay route list sentry/javascript \
--path "/dashboard/*" \
--period 24h \
--json
Useful output per route visit:
replayId
user or stable user key when available
path
enteredAtOffsetMs
leftAtOffsetMs or replay end
durationMs
- counts by normalized event kind: clicks, taps, inputs, scrolls, focuses, navigations
- booleans such as
hadInteraction, hadInput, hadScroll, leftWithoutInteraction
nextPath when the user navigated away
This would answer many questions of the form "users visited X, then did/did not do Y" without teaching the CLI a product-specific concept.
Replay-level event aggregation
Add a generic aggregation mode for normalized replay events. For example, group counts by replay, user, route, or route visit.
Example shape:
sentry replay event summary sentry/javascript \
--query "url:*dashboard*" \
--path "/dashboard/*" \
--group-by replay \
--json
This should produce compact machine-readable rows rather than requiring callers to fetch and process every event manually.
Session vs user accounting
The CLI should make it easy to report both session-level and user-level rates. In the real workflow, we had to refetch replay list output with user fields and join that with event classifications externally.
A generalized summary command should expose both:
- total matching replay sessions
- classifiable replay sessions
- distinct known users
- sessions/users matching a predicate
- explicit handling for missing user identity
Predicate-friendly output, not product-specific commands
The CLI does not need a dashboard abandonment command. It needs general replay predicates that agents can compose, such as:
- route matched pattern
- no click/tap/input/scroll before next route
- no interaction before replay end
- first interaction occurred after N seconds
- route duration less than N seconds
- next route matched pattern
These can live as fields in route summaries or as filters over those summaries.
Agent-friendly fan-out and failure handling
The CLI should own the mechanics of high-cardinality replay analysis:
- bounded concurrency
- retry/backoff
- per-replay error rows
- partial-result summaries
- stable JSONL for large outputs
- explicit
truncated and classifiable fields
Shelling out with xargs -P works, but it makes agents responsible for operational details that the CLI can handle more reliably.
Acceptance Criteria Additions
- Agents can inspect events for many replay IDs with one command.
- Agents can produce route-window summaries for arbitrary route/path patterns.
- Agents can compute route-level interaction counts without custom timeline joining.
- Outputs distinguish session-level and user-level counts when user identity is available.
- Large replay analyses support JSONL, bounded concurrency, retries, and partial-failure reporting.
- The feature remains generic: it supports arbitrary route patterns and event predicates rather than hard-coded dashboard/product analytics questions.
Summary
Make Session Replay data in
sentry-clihighly actionable for coding agents by adding first-class replay segment fetching, local caching, normalized event extraction, and inspection commands.The CLI should not try to be the agent or own an
askcommand as the primary interface. Instead, it should become the replay data plane: a reliable way for agents to fetch replay segments, cache them, inspect DOM/rrweb/custom events, search timelines, and pull evidence windows around user actions. Agents can then compose those tools, plus a bundled skill that explains the replay data model, to answer questions such as:Current State
The CLI currently has basic replay support:
sentry replay listqueries replay metadata.sentry replay viewfetches replay detail, related issues/traces, and a very small activity preview from recording segments.sentry explore --dataset replaysexposes replay index fields.Relevant CLI files:
src/commands/replay/list.tssrc/commands/replay/view.tssrc/lib/api/replays.tssrc/lib/formatters/replay.tssrc/lib/replay-search.tssrc/types/replay.tsThe main limitation is that replay segments are treated as display garnish.
replay viewcurrently extracts only a handful of activity events from raw segments, capped to a tiny preview. There is no replay-specific local cache, no normalized event stream, no segment index, and no way for agents to inspect all DOM/rrweb/custom events.There is also a likely correctness gap: the Sentry recording-segments endpoint is paginated, while the CLI currently downloads it as a single request. The frontend fetches segment pages with
per_page=100untilcount_segmentsis exhausted. The CLI should mirror that behavior so long replays are fully available.Relevant Sentry/rrweb Model
Sentry replay data has several useful layers:
recording-segmentsendpoint. These are compressed/packed storage blobs returned as rrweb/custom JSON when downloaded.Useful Sentry references:
getsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_index.pygetsentry/sentry/src/sentry/replays/endpoints/project_replay_recording_segment_details.pygetsentry/sentry/src/sentry/replays/usecases/reader.pygetsentry/sentry/src/sentry/replays/usecases/pack.pygetsentry/sentry/src/sentry/replays/post_process.pygetsentry/sentry/src/sentry/replays/usecases/ingest/event_parser.pygetsentry/sentry/static/app/utils/replays/hooks/useReplayData.tsxgetsentry/sentry/static/app/utils/replays/hydrateFrames.tsxgetsentry/sentry/static/app/utils/replays/replayReader.tsx@sentry-internal/rrweb-types, especiallyEventType,IncrementalSource,MouseInteractions, mutation/input/scroll/viewport payloads.Proposal
Add a replay evidence system to the CLI with three parts:
1. Replay Bundle Cache
Add a replay-specific cache under something like:
Suggested contents:
The raw segment payloads should live on disk, not in SQLite. SQLite can track manifests/cache lookup if useful, but segment blobs can be large and should be stored as private files.
Security/privacy requirements:
0700; files should be0600.Caching behavior:
2. Normalized Replay Event Model
Introduce a normalized event schema that agents can rely on, regardless of whether the source was rrweb, a Sentry custom frame, a breadcrumb, a perf span, or related event data.
Example event:
{ "replayId": "abc123", "segmentId": 12, "frameIndex": 184, "offsetMs": 83421, "timestamp": "2026-05-03T18:42:11.421Z", "kind": "click", "category": "interaction", "label": "button.checkout", "url": "/checkout", "selector": "button[data-test-id=checkout]", "nodeId": 982, "rawType": "IncrementalSnapshot", "rawSource": "MouseInteraction" }Initial event kinds should include:
navigationclicktapinputfocusblurscrollviewportmutationdom-snapshotbreadcrumbnetworkconsoleerrorspanweb-vitalmemoryvideomobileImportant implementation detail: centralize timestamp normalization. rrweb event timestamps are milliseconds, while breadcrumb/performance payload fields may use seconds. Sentry has existing frontend/backend logic for this; the CLI should port the relevant normalization and test it with fixtures.
3. Agent-Friendly Commands
sentry replay fetch <replay>Fetch replay metadata, all recording segment pages, and optionally related errors/traces/logs. Build/update the local replay bundle and indexes.
Useful flags:
sentry replay events <replay>Primary agent primitive. Emit normalized replay events.
Useful flags:
This command should make DOM/rrweb activity inspectable without requiring agents to know the raw segment layout.
sentry replay window <replay>Return an evidence slice around a timestamp or event match.
Examples:
Output should group nearby activity by category: interaction, navigation, network, console, errors, DOM mutations, spans, web vitals.
sentry replay search <replay> <query>Fuzzy search over normalized event fields:
Return matching events with stable pointers: replay id, segment id, frame index, timestamp, offset.
sentry replay dom <replay>Inspect DOM-related replay data.
Initial version can be event-based rather than full reconstruction:
Later versions can add best-effort DOM reconstruction from full snapshots + incremental mutations. A browser-backed rrweb player should be optional, not required for the core CLI flow.
sentry replay stats <replay>Deterministic summary useful for orientation:
sentry replay struggles <replay>Deterministic friction analysis. Rank likely struggle windows using signals such as:
Each finding should include evidence pointers and a recommended follow-up command.
Example output shape:
{ "finding": "Repeated checkout clicks did not produce navigation", "severity": "medium", "window": {"fromOffsetMs": 81200, "toOffsetMs": 94600}, "evidence": [ {"kind": "click", "offsetMs": 83421, "segmentId": 12, "frameIndex": 184}, {"kind": "click", "offsetMs": 87820, "segmentId": 12, "frameIndex": 211}, {"kind": "network", "offsetMs": 88110, "status": 500, "url": "/api/checkout"} ], "nextCommand": "sentry replay window org/project/abc123 --at 01:23 --before 10s --after 30s" }Bundled Skill / Agent Documentation
Add generated or maintained skill docs that teach agents how to use these replay commands.
The skill should explain:
sentry replay fetch.sentry replay statsfor orientation.sentry replay strugglesto find likely friction.sentry replay searchto locate user actions.sentry replay windowaround relevant timestamps.sentry replay events --kind ...for detailed evidence.sentry replay domfor DOM/mutation/input/scroll inspection.segmentId,frameIndex, and offsets in conclusions.This keeps the reasoning layer outside the CLI while making the CLI highly usable by agents.
Implementation Plan
Phase 1: Correct segment fetching
per_page=100.count_segmentsfrom replay detail when available.replay viewbehavior, but ensure it uses complete segments or explicitly reports partial data.Phase 2: Add replay cache and fetch command
sentry replay fetch.Phase 3: Normalize events
Phase 4: Add inspection commands
replay events.replay window.replay search.replay stats.replay struggles.Phase 5: DOM inspection
Phase 6: Agent skill/docs
Open Questions
replay fetchinclude by default: errors, traces, logs, click selectors?replay events --rawinclude payload snippets or only stable pointers by default?Acceptance Criteria
Follow-up: Feedback From a Real Multi-Replay Analysis
We tested the current replay command set against a generalized product analytics question:
In this case the route happened to be Sentry dashboard detail pages, but the workflow is intentionally broader: identify sessions that visited a route pattern, inspect the event timeline for each session, and classify whether any interaction occurred before the next route or replay end.
The current commands made this possible, but the workflow required too much external orchestration:
replay listto find replay IDs by URL/path.replay event listonce per replay, hundreds of times.jqlogic.That is a good sign that the primitives are useful, but a poor experience for agents and humans trying to answer broad replay questions.
Generalized Improvements Suggested by This Exercise
Batch replay event inspection
replay event listshould support inspecting multiple replays in one command, using stdin or a file of replay IDs. The CLI should handle bounded concurrency, retries, and partial failures internally.Example shape:
Each emitted row should include
replayId, and the command should provide a final summary of successes, failures, and truncation.Route visit summaries
Add a generic route/session summary primitive that turns replay timelines into route windows. This should not be dashboard-specific; it should work for any route glob/path matcher.
Example shape:
sentry replay route list sentry/javascript \ --path "/dashboard/*" \ --period 24h \ --jsonUseful output per route visit:
replayIduseror stable user key when availablepathenteredAtOffsetMsleftAtOffsetMsor replay enddurationMshadInteraction,hadInput,hadScroll,leftWithoutInteractionnextPathwhen the user navigated awayThis would answer many questions of the form "users visited X, then did/did not do Y" without teaching the CLI a product-specific concept.
Replay-level event aggregation
Add a generic aggregation mode for normalized replay events. For example, group counts by replay, user, route, or route visit.
Example shape:
This should produce compact machine-readable rows rather than requiring callers to fetch and process every event manually.
Session vs user accounting
The CLI should make it easy to report both session-level and user-level rates. In the real workflow, we had to refetch replay list output with
userfields and join that with event classifications externally.A generalized summary command should expose both:
Predicate-friendly output, not product-specific commands
The CLI does not need a
dashboard abandonmentcommand. It needs general replay predicates that agents can compose, such as:These can live as fields in route summaries or as filters over those summaries.
Agent-friendly fan-out and failure handling
The CLI should own the mechanics of high-cardinality replay analysis:
truncatedandclassifiablefieldsShelling out with
xargs -Pworks, but it makes agents responsible for operational details that the CLI can handle more reliably.Acceptance Criteria Additions