Primer Roadmap

Primer is the harness intelligence layer for agentic engineering. Research and industry evidence converge on a single insight: outcome quality is determined more by the agent harness — tool design, context management, caching, orchestration, and permission boundaries — than by model capability alone. Primer captures session telemetry across agents, decomposes it into harness dimensions, and measures which configurations actually improve outcomes. It turns that data into harness attribution, coaching, enablement, and operational decisions.

This roadmap is organized in two layers:

Strategy and priorities at the top.
Detailed shipped and planned capabilities underneath.

Items marked with [x] are shipped. Items marked with [ ] are planned. Planned items include rough priority tags:

P0 - foundational and near-term
P1 - important follow-on work
P2 - valuable expansion work

What Primer Should Help Teams Answer

Harness effectiveness: Which harness configurations (tool designs, caching strategies, context management, orchestration patterns, permission boundaries) correlate with better outcomes?
Harness attribution: When a session succeeds or fails, which harness components contributed? What's the per-step compound reliability?
Harness evolution: How have harness configurations changed over time, and did those changes improve outcomes?
Harnessability: Are codebases and teams structurally ready for effective agent harnesses (documentation quality, typing, module boundaries, data governance)?
Dead weight: Which harness configurations are outdated compensations for older model limitations that now bottleneck performance?
Environment effectiveness: Where is the project context failing the engineer, and what harness or repository changes will unblock them?

Product Goals

Measure harness effectiveness, not just usage — decompose outcomes to the harness component level.
Build the "code coverage for harnesses" that the industry is asking for (per-component reliability, compound failure math).
Track longitudinal harness evolution so teams can see how configuration changes correlate with outcome changes over time.
Make harnessability scoring a first-class product surface (documentation quality, context freshness, guide/sensor coverage).
Position Primer as the engineer's ally—proving when a codebase is not "AI-ready" rather than focusing on engineer inefficiency or surveillance.
Close the loop from harness insight to auto-remediation — where Primer acts as an agent itself to fix broken repository context.
Bring harness intelligence into the engineer workflow via MCP sidecar, not only after the fact.

Strategic Themes

P0: Enterprise Security & Trust

Telemetry capture cannot mean IP leakage. Primer must implement robust secret, PII, and IP scrubbing at the capture layer before any data is persisted, alongside transcript cold storage strategies for scale. Security teams must trust Primer by default.

P0: Harness Attribution and Compound Reliability

Primer's moat is decomposing outcomes to the harness component level. Per-tool success rates, compound reliability math (10 steps at 99% = 90.4% end-to-end), and harness configuration fingerprinting from session telemetry. This is the "code coverage for harnesses" that the industry is asking for.

P0: Measurement Integrity

Trustworthy semantics remain foundational. Clean taxonomy for outcomes, goals, friction, and success, plus reprocessing and coverage tooling so every downstream metric is credible.

P0: Harness Evolution Tracking

Longitudinal correlation of harness configuration changes with outcome changes over time. LangChain rewrote their harness 4x in one year; Vercel removed 80% of tools and improved. No tool tracks this — Primer's team-level time-series data is uniquely positioned.

P1: Harnessability Scoring

Measure whether codebases and teams have the structural properties (documentation quality, context freshness, module boundaries, guide/sensor coverage) that make agent harnesses effective. Extends existing project readiness into a full harnessability assessment.

P1: Closed-Loop Enablement and Auto-Remediation

Recommendations should become measurable interventions, shifting away from "engineer coaching" and toward "environment fixing." Primer should begin acting as an agent itself, automatically opening PRs to update outdated AGENTS.md or remove dead-weight MCP tools.

P1: In-Workflow Guidance

The most valuable insights should show up during the session via MCP sidecar: harness health scores, context quality warnings, dead weight alerts, and configuration recommendations.

P2: Harness Simulation & Backtesting

If an organization changes permission boundaries or MCP tools, they should know if it works. Primer will enable "backtesting"—running past failed sessions through new harness configurations to mathematically prove the change improves outcomes.

P2: Operational Scale and Enterprise Readiness

Derived data pipelines, performance optimization, durable background jobs, enterprise identity, and observability.

Near-Term Priorities

P0 Local secret, PII, and IP redaction pipeline at the capture layer before database insertion.
P0 Per-tool success rate tracking with compound reliability computation — decompose session outcomes to the tool/step level.
P0 Harness configuration fingerprinting — extract and catalog the actual harness configuration (tools, context files, permissions, customizations) from session telemetry.
P0 Context quality scoring — measure AGENTS.md freshness, token efficiency, and guide/sensor coverage per project.
P1 Harness evolution timeline — before/after correlation of configuration changes with outcome changes.
P1 Harnessability scoring per project — documentation quality, typing strength, module boundaries, data governance readiness.
P1 Issue tracker integration (Linear/Jira) to connect session success to ticket-to-merge cycle time.
P1 Paragon's 4-dimension evaluation — tool correctness, tool usage accuracy, task completion, task efficiency.
P1 Semantic search over sessions via pgvector — exemplar discovery and cross-engineer pattern matching.
P2 Primer-as-Agent auto-remediation — Primer automatically generates PRs for environment fixes.
P2 Harness backtesting — simulate past sessions against new configurations.

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Developer Enablement & Analytics

Note: We are explicitly shifting our product vernacular and UI away from "Engineer Coaching/Surveillance" toward "Harness & Project Enablement" to ensure Primer acts as a developer ally. Legacy views will be reframed to highlight repo enablement.

Growth & Onboarding

Cohort comparison (new hire / ramping / experienced)
Time-to-team-average tracking for new hires
Onboarding velocity scoring
Onboarding recommendations
Shared behavior pattern discovery with approach comparison
[P1] Bright spot detection: explicitly surface high performers and cross-pollinate their patterns
[P1] Exemplar-session-to-learning-path pipeline
[P1] Team skill gap mapping by workflow, tool category, and project context
[P2] Coaching program measurement: which onboarding or training changes improved outcomes

Project Intelligence

Dedicated project workspace with readiness, friction, quality, cost, and enablement views
Project AI-readiness scoring (CLAUDE.md, AGENTS.md, .claude/ detection)
Project scorecard that combines adoption, effectiveness, quality, and cost efficiency
[P0] Project-level workflow fingerprints and friction hotspots
[P1] Project-level agent mix comparison, including Cursor sessions alongside CLI agents
[P1] Repository context model: language mix, test maturity, repo size, and AI-enablement signals
[P1] Project enablement recommendations tied to observed bottlenecks
[P1] Cross-project comparison: which repos are easiest or hardest to use AI effectively in
[P2] Project playbook templates for greenfield, legacy, high-compliance, and test-poor repos

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

AI-generated narrative reports (engineer, team, org scope)
Narrative caching with TTL-based expiry
Auto-refresh via lifespan task
Conversational data explorer (SSE-streamed tool-use chat)
AI-powered recommendations panel
[P1] Saved explorer prompts and reusable report cards
[P1] Compare mode for engineer, team, project, and time-period analysis
[P2] Weekly manager review packs that combine quality, friction, growth, and cost
[P2] Recommendation narratives that explain why a workflow is likely to help

Website & Positioning

[P1] Reposition the website around harness intelligence for agentic engineering
[P1] Showcase harness effectiveness, cost attribution, quality, and exemplar sessions as the core proof points

Interventions & Experimentation

[P0] Recommendation-to-intervention workflow with owner, status, due date, and linked evidence
[P0] Before-and-after measurement for coaching, tooling, or repo changes
[P1] Experimentation layer for training rollouts, tool changes, and enablement playbooks
[P1] Intervention effectiveness reporting by team, project, and engineer cohort
[P1] Primer-as-Agent Auto-Remediation: Automatically generate pull requests to fix outdated context (e.g., AGENTS.md) or dead-weight tools
[P2] Auto-generated next-step plans from alerts, narratives, and project findings

Real-Time Engineer Experience

MCP sidecar with on-demand stats, friction reports, and recommendations
[P0] Proactive coaching skill that activates at session start with contextual suggestions
[P0] Live session signals that stream friction, satisfaction, and risk as work happens
[P1] In-session workflow nudges based on project playbooks and prior failures
[P1] Daily and weekly personal recaps inside the sidecar
[P2] Lightweight session planning prompts before complex work begins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Primer Roadmap

What Primer Should Help Teams Answer

Product Goals

Strategic Themes

P0: Enterprise Security & Trust

P0: Harness Attribution and Compound Reliability

P0: Measurement Integrity

P0: Harness Evolution Tracking

P1: Harnessability Scoring

P1: Closed-Loop Enablement and Auto-Remediation

P1: In-Workflow Guidance

P2: Harness Simulation & Backtesting

P2: Operational Scale and Enterprise Readiness

Near-Term Priorities

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Developer Enablement & Analytics

Growth & Onboarding

Project Intelligence

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

Website & Positioning

Interventions & Experimentation

Real-Time Engineer Experience

Organization & Administration

Platform & Infrastructure

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Primer Roadmap

What Primer Should Help Teams Answer

Product Goals

Strategic Themes

P0: Enterprise Security & Trust

P0: Harness Attribution and Compound Reliability

P0: Measurement Integrity

P0: Harness Evolution Tracking

P1: Harnessability Scoring

P1: Closed-Loop Enablement and Auto-Remediation

P1: In-Workflow Guidance

P2: Harness Simulation & Backtesting

P2: Operational Scale and Enterprise Readiness

Near-Term Priorities

Detailed Roadmap

Measurement Integrity & Data Foundation

Session Intelligence

Friction & Bottleneck Analysis

Developer Enablement & Analytics

Growth & Onboarding

Project Intelligence

Harness Intelligence

Code Quality

FinOps & Cost Management

AI Synthesis & Explorer

Website & Positioning

Interventions & Experimentation

Real-Time Engineer Experience

Organization & Administration

Platform & Infrastructure