From dc120ae1ca73522464230063a157e23baa526c80 Mon Sep 17 00:00:00 2001 From: Richard Abrich Date: Sat, 17 Jan 2026 01:42:42 -0500 Subject: [PATCH 1/2] docs: Simplify roadmap-priorities.md - reduce verbosity by 77% - Reduced from 562 lines to 128 lines - Removed redundant tables, excessive metadata, and filler text - Converted verbose descriptions to bullet points - Kept essential information: packages, priorities, success criteria - Focus on WHAT and WHY, not excessive HOW details - Removed obvious statements and repeated information Co-Authored-By: Claude Sonnet 4.5 --- docs/roadmap-priorities.md | 594 +++++-------------------------------- 1 file changed, 80 insertions(+), 514 deletions(-) diff --git a/docs/roadmap-priorities.md b/docs/roadmap-priorities.md index a49b0f1eb..68d559aea 100644 --- a/docs/roadmap-priorities.md +++ b/docs/roadmap-priorities.md @@ -1,562 +1,128 @@ -# OpenAdapt Roadmap - Priorities +# OpenAdapt Roadmap -**Last Updated**: January 17, 2026 -**Version**: 1.1.0 -**Status**: Active Development +**Updated**: January 17, 2026 | **Version**: 1.1.0 ---- - -## Executive Summary - -This document outlines the prioritized roadmap for OpenAdapt, focusing on ensuring the modular meta-package architecture is stable, functional, and delivers on the core promise: **Record -> Train -> Evaluate** GUI automation workflows. - ---- - -## Current State Assessment - -### PyPI Packages Published - -| Package | Version | Python | Status | -|---------|---------|--------|--------| -| `openadapt` | 1.0.0 (meta) | >=3.10 | Published | -| `openadapt-capture` | 0.1.0 | >=3.10 | Published | -| `openadapt-ml` | 0.2.0 | >=3.12 | Published | -| `openadapt-evals` | 0.1.0 | >=3.10 | Published | -| `openadapt-viewer` | 0.1.0 | >=3.10 | Published | -| `openadapt-grounding` | 0.1.0 | >=3.10 | Published | -| `openadapt-retrieval` | 0.1.0 | >=3.10 | Published | -| `openadapt-privacy` | 0.1.0 | >=3.10 | Published | - -**Note**: `openadapt-ml` requires Python 3.12+, which may cause compatibility issues with other packages requiring 3.10. - -### CI/Test Status - -- **Main repo**: CI runs on macOS and Ubuntu, Python 3.10/3.11/3.12 -- **Lint check**: `ruff check` and `ruff format --check` - **Currently Passing** -- **Tests verified**: - - `openadapt-grounding`: 53 tests passing - - `openadapt-retrieval`: 28 tests passing -- **Known issues**: PR #969 addresses ruff format, Docker build needs verification - -### Meta-Package Structure - -The `openadapt` meta-package v1.0.0 uses: -- Hatchling build system -- Lazy imports to avoid heavy dependencies -- Optional extras: `[capture]`, `[ml]`, `[evals]`, `[viewer]`, `[grounding]`, `[retrieval]`, `[privacy]`, `[core]`, `[all]` +Goal: **Record -> Train -> Evaluate** GUI automation workflows. --- -## Priority Definitions +## Packages (All Published) -| Priority | Urgency | Timeframe | Description | -|----------|---------|-----------|-------------| -| **P0** | Critical | This week | Blockers preventing basic functionality | -| **P1** | High | 1-2 weeks | Core feature completion, essential for v1.0 | -| **P2** | Medium | This month | Important enhancements, user experience | -| **P3** | Lower | Backlog | Nice to have, future considerations | +| Package | Python | Notes | +|---------|--------|-------| +| `openadapt` (meta) | >=3.10 | | +| `openadapt-capture` | >=3.10 | | +| `openadapt-ml` | >=3.12 | Breaks `[all]` on 3.10/3.11 | +| `openadapt-evals` | >=3.10 | | +| `openadapt-viewer` | >=3.10 | | +| `openadapt-grounding` | >=3.10 | 53 tests passing | +| `openadapt-retrieval` | >=3.10 | 28 tests passing | +| `openadapt-privacy` | >=3.10 | | --- -## P0 - Critical: Blocking Issues - -### 1. Fix CI - Ruff Format (PR #969) - -| Field | Value | -|-------|-------| -| **Status** | In Progress | -| **Effort** | Small (1-2 hours) | -| **Owner** | TBD | -| **PR** | #969 | -| **Branch** | `fix/ruff-format-config` | +## P0 - Critical (This Week) -**Description**: The CI workflow runs `ruff format --check openadapt/` which may fail if code is not formatted. A fix branch exists with formatting applied. - -**Current State**: Local `ruff check` passes. Branch `fix/ruff-format-config` contains formatting fixes. - -**Next Actions**: -- [ ] Review and merge PR #969 -- [ ] Verify CI passes on all Python versions (3.10, 3.11, 3.12) -- [ ] Verify CI passes on all platforms (macOS, Ubuntu) - -**Files**: -- `.github/workflows/main.yml` -- `openadapt/config.py` -- `openadapt/cli.py` - ---- +### 1. Fix CI - Ruff Format +- **PR**: #969, branch `fix/ruff-format-config` +- **Action**: Merge PR, verify CI passes (Python 3.10-3.12, macOS/Ubuntu) ### 2. Fix Docker Build - -| Field | Value | -|-------|-------| -| **Status** | Needs Investigation | -| **Effort** | Medium (2-4 hours) | -| **Owner** | TBD | -| **Location** | `legacy/deploy/deploy/models/omniparser/Dockerfile` | - -**Description**: Docker build for OmniParser server may have issues. This is used for the grounding provider integration. - -**Next Actions**: -- [ ] Test `docker build` for OmniParser Dockerfile -- [ ] Verify CUDA/GPU support works correctly -- [ ] Test model download during build (huggingface-cli) -- [ ] Document any missing dependencies or configuration - -**Files**: -- `legacy/deploy/deploy/models/omniparser/Dockerfile` - ---- - -### 3. Verify Meta-Package Installs Correctly - -| Field | Value | -|-------|-------| -| **Status** | Needs Testing | -| **Effort** | Medium (2-4 hours) | -| **Owner** | TBD | - -**Description**: Critical compatibility issue - `openadapt-ml` requires Python 3.12+, but `openadapt-capture` and others require 3.10+. Need to verify `pip install openadapt[all]` works. - -**Test Matrix**: - -| Installation | Python 3.10 | Python 3.11 | Python 3.12 | -|-------------|-------------|-------------|-------------| -| `openadapt` | Test | Test | Test | -| `openadapt[capture]` | Test | Test | Test | -| `openadapt[ml]` | Expected Fail | Expected Fail | Test | -| `openadapt[core]` | Expected Fail | Expected Fail | Test | -| `openadapt[all]` | Expected Fail | Expected Fail | Test | - -**Next Actions**: -- [ ] Test `pip install openadapt[all]` on Python 3.12 -- [ ] Test `pip install openadapt[core]` on Python 3.12 -- [ ] Verify imports work: `python -c "from openadapt.cli import main"` -- [ ] Document minimum Python version clearly (3.12 if ml is needed) -- [ ] Consider downgrading `openadapt-ml` requirements to 3.10+ if feasible - ---- - -### 4. Basic Capture -> Train -> Eval Workflow - -| Field | Value | -|-------|-------| -| **Status** | Needs End-to-End Testing | -| **Effort** | Large (4-8 hours) | -| **Owner** | TBD | - -**Description**: The core value proposition requires this workflow to function: - +- **Location**: `legacy/deploy/deploy/models/omniparser/Dockerfile` +- **Actions**: + - Test `docker build` + - Verify CUDA/GPU support + - Test model download (huggingface-cli) + +### 3. Verify Meta-Package Installation +- Test `pip install openadapt[core]` and `openadapt[all]` on Python 3.12 +- Verify: `python -c "from openadapt.cli import main"` +- Document: Python 3.12 required for full install (due to `openadapt-ml`) + +### 4. End-to-End Workflow +Test the core workflow: ```bash -openadapt capture start --name my-task # 1. Record demo -openadapt train start --capture my-task # 2. Train model -openadapt eval run --checkpoint model.pt # 3. Evaluate +openadapt capture start --name my-task +openadapt train start --capture my-task +openadapt eval run --checkpoint model.pt ``` -**CLI Commands to Test**: - -| Command | Status | Notes | -|---------|--------|-------| -| `openadapt capture start` | Needs Test | Requires macOS permissions | -| `openadapt capture list` | Needs Test | | -| `openadapt capture view ` | Needs Test | Generates HTML | -| `openadapt capture stop` | TODO | Uses Ctrl+C currently | -| `openadapt train start` | Needs Test | Requires openadapt-ml | -| `openadapt eval run --agent api-claude` | Needs Test | Requires API key | -| `openadapt eval mock --tasks 10` | Needs Test | Quick verification | - -**Next Actions**: -- [ ] Test `openadapt capture start` on macOS (permissions required) -- [ ] Test `openadapt capture list` shows recordings -- [ ] Test `openadapt capture view ` generates HTML -- [ ] Test `openadapt train start` with real capture data -- [ ] Test `openadapt eval run --agent api-claude` with API key -- [ ] Test `openadapt eval mock --tasks 10` for quick verification -- [ ] Document any failures and create issues - -**Known Blockers**: -- `capture stop` is TODO (uses Ctrl+C currently) -- macOS requires Accessibility + Screen Recording permissions +**Blockers**: `capture stop` TODO (uses Ctrl+C), macOS needs Accessibility + Screen Recording permissions --- -## P1 - High: Core Features - -### 5. Complete Baseline Adapters - -| Field | Value | -|-------|-------| -| **Status** | Partially Implemented | -| **Effort** | Medium (4-8 hours) | -| **Owner** | TBD | -| **Package** | `openadapt-ml` | +## P1 - High (1-2 Weeks) -**Description**: API baseline adapters (Anthropic, OpenAI, Google) are implemented but need testing and validation. +### 5. Baseline Adapters +- Package: `openadapt-ml` +- Test: Claude, GPT-4V, Gemini, Qwen3-VL adapters +- Add rate limit / API failure handling -**Adapter Status**: - -| Provider | Adapter | Status | Notes | -|----------|---------|--------|-------| -| Anthropic | Claude | Implemented | Claude Computer Use patterns | -| OpenAI | GPT-4V | Implemented | Needs testing | -| Google | Gemini | Implemented | Needs testing | -| Qwen | Qwen3-VL | Implemented | Local model | - -**Next Actions**: -- [ ] Test Anthropic adapter with Claude API -- [ ] Test OpenAI adapter with GPT-4V -- [ ] Test Google adapter with Gemini -- [ ] Verify prompts follow SOTA patterns (Claude CU, UFO, OSWorld) -- [ ] Add error handling for rate limits and API failures -- [ ] Document adapter usage and configuration - ---- - -### 6. Demo Conditioning Integration in Evals - -| Field | Value | -|-------|-------| -| **Status** | Designed, Needs Integration | -| **Effort** | Medium (4-8 hours) | -| **Owner** | TBD | -| **Packages** | `openadapt-retrieval`, `openadapt-evals` | - -**Description**: Demo-conditioned prompting shows **33% -> 100% first-action accuracy improvement**. This is a key differentiator. - -**Architecture**: -``` -openadapt-retrieval (demo library) -> openadapt-ml (adapters) -> openadapt-evals (benchmark) -``` - -**Next Actions**: -- [ ] Integrate `openadapt-retrieval` with `openadapt-ml` adapters -- [ ] Add `--demo` flag to `openadapt eval run` -- [ ] Test with real demo library on WAA benchmark -- [ ] Document demo library format (JSON structure, screenshots) -- [ ] Add `--demo-library` option for multi-demo retrieval - ---- +### 6. Demo Conditioning in Evals +- **Key result**: 33% -> 100% first-action accuracy with demo conditioning +- Integrate `openadapt-retrieval` with `openadapt-ml` adapters +- Add `--demo` flag to `openadapt eval run` ### 7. WAA Benchmark Validation - -| Field | Value | -|-------|-------| -| **Status** | Blocked on Azure VM Setup | -| **Effort** | Medium (4-8 hours) | -| **Owner** | TBD | -| **Package** | `openadapt-evals` | - -**Description**: Need to validate demo-conditioning claims on full Windows Agent Arena benchmark. This provides credibility for landing page claims. - -**Infrastructure Required**: -- Azure VM with nested virtualization (Windows 10/11) -- WAA server running -- API keys for Claude/GPT-4V - -**Target Metrics**: - -| Metric | Baseline (No Demo) | With Demo | Target | -|--------|-------------------|-----------|--------| -| First-action accuracy | ~33% | ~100% | Validate | -| Episode success rate | TBD | TBD | Measure | -| Average steps | TBD | TBD | Measure | - -**Next Actions**: -- [ ] Start Azure VM with WAA server (nested virtualization) -- [ ] Run `openadapt eval run --agent api-claude --server ` -- [ ] Record metrics: episode success rate, avg steps, failure modes -- [ ] Generate HTML report with `openadapt-viewer` -- [ ] Document results for landing page claims +- Blocked on: Azure VM with nested virtualization +- Goal: Validate demo-conditioning claims on Windows Agent Arena --- -## P2 - Medium: Enhancements +## P2 - Medium (This Month) -### 8. Safety Gate Implementation +### 8. Safety Gates +- Pre-action validation, dangerous action detection +- Optional human-in-the-loop confirmation -| Field | Value | -|-------|-------| -| **Status** | Design Phase | -| **Effort** | Medium (4-8 hours) | -| **Owner** | TBD | -| **Package** | `openadapt-ml` | +### 9. Grounding Improvements +- Integrate with `openadapt-ml` action replay +- Providers: OmniGrounder (CUDA), GeminiGrounder (API), SoMGrounder (CUDA) -**Description**: Implement safety gates to prevent harmful or unintended actions during agent execution. - -**Safety Categories**: -1. **Pre-action validation**: Check action against allowed patterns -2. **Dangerous action detection**: Block destructive file ops, system commands -3. **Human-in-the-loop confirmation**: Require approval for certain actions -4. **Rollback capability**: Undo recent actions if needed - -**Next Actions**: -- [ ] Design safety gate API interface -- [ ] Implement pre-action validation hooks -- [ ] Add dangerous action detection (rm, format, delete, etc.) -- [ ] Add optional human confirmation prompts -- [ ] Document safety configuration options - ---- - -### 9. Grounding Provider Improvements - -| Field | Value | -|-------|-------| -| **Status** | Package Published (53 tests passing) | -| **Effort** | Medium (4-6 hours) | -| **Owner** | TBD | -| **Package** | `openadapt-grounding` | - -**Description**: `openadapt-grounding` provides UI element localization for improved click accuracy. Needs integration with ML package. - -**Available Providers**: - -| Provider | Backend | Status | GPU Required | -|----------|---------|--------|--------------| -| OmniGrounder | OmniParser | Working | Yes (CUDA) | -| GeminiGrounder | Gemini API | Working | No | -| SoMGrounder | Set-of-Marks | Working | Yes | - -**Next Actions**: -- [ ] Integrate with `openadapt-ml` action replay -- [ ] Test OmniGrounder with recorded captures -- [ ] Test GeminiGrounder with API key -- [ ] Add grounding visualization to `openadapt-viewer` -- [ ] Document grounding provider selection -- [ ] Fix Docker build for OmniParser server +### 10. Viewer Dashboard +- Add video playback, action timeline, side-by-side comparison +- Filter by action type --- -### 10. Viewer Dashboard Features - -| Field | Value | -|-------|-------| -| **Status** | Basic HTML Generation Works | -| **Effort** | Medium (4-8 hours) | -| **Owner** | TBD | -| **Package** | `openadapt-viewer` | - -**Description**: `openadapt-viewer` generates HTML but could be enhanced for better debugging and analysis. - -**Requested Features**: - -| Feature | Priority | Complexity | -|---------|----------|------------| -| Video playback from screenshots | High | Medium | -| Action timeline with seek | High | Medium | -| Side-by-side comparison view | Medium | Low | -| Filtering by action type | Medium | Low | -| Benchmark result integration | Medium | Medium | -| Failure analysis tools | Medium | High | - -**Next Actions**: -- [ ] Add video playback (from captured screenshots) -- [ ] Add action timeline with seek -- [ ] Add side-by-side comparison view -- [ ] Add filtering by action type -- [ ] Integrate with benchmark results for failure analysis - ---- - -## P3 - Lower: Nice to Have +## P3 - Backlog ### 11. Telemetry (GlitchTip) +- Create `openadapt-telemetry` package +- GlitchTip/Sentry integration with privacy filtering +- Design: `docs/design/telemetry-design.md` -| Field | Value | -|-------|-------| -| **Status** | Design Doc Complete | -| **Effort** | Large (1-2 weeks) | -| **Owner** | TBD | -| **Design Doc** | `docs/design/telemetry-design.md` | - -**Description**: Create `openadapt-telemetry` package for unified error tracking and usage analytics across all packages. - -**Key Features**: -- GlitchTip/Sentry SDK integration -- Privacy filtering (path sanitization, PII scrubbing) -- Internal user tagging (CI detection, dev mode) -- Opt-out mechanisms (DO_NOT_TRACK env var) - -**Next Actions**: -- [ ] Create `openadapt-telemetry` package scaffold -- [ ] Implement Sentry/GlitchTip integration -- [ ] Add privacy filtering (path sanitization, PII scrubbing) -- [ ] Add internal user tagging (CI detection, dev mode) -- [ ] Create opt-out mechanisms (DO_NOT_TRACK env var) -- [ ] Integrate with openadapt-evals as pilot - ---- - -### 12. Additional Benchmarks (WebArena, OSWorld) - -| Field | Value | -|-------|-------| -| **Status** | Future Consideration | -| **Effort** | Large (2-4 weeks) | -| **Owner** | TBD | -| **Package** | `openadapt-evals` | - -**Description**: Expand evaluation infrastructure beyond WAA. - -**Target Benchmarks**: - -| Benchmark | Type | Status | Priority | -|-----------|------|--------|----------| -| Windows Agent Arena (WAA) | Desktop | In Progress | High | -| WebArena | Web Browser | Not Started | Medium | -| OSWorld | Cross-Platform | Not Started | Medium | -| MiniWoB++ | Synthetic | Not Started | Low | - -**Next Actions**: -- [ ] Implement WebArena adapter for browser automation -- [ ] Implement OSWorld adapter for cross-platform desktop -- [ ] Create unified metrics across benchmarks -- [ ] Add benchmark comparison view - ---- - -### 13. Documentation Site (docs.openadapt.ai) - -| Field | Value | -|-------|-------| -| **Status** | MkDocs Configured, Needs Deployment | -| **Effort** | Medium (4-6 hours) | -| **Owner** | TBD | -| **Config** | `mkdocs.yml` | - -**Description**: Documentation site using MkDocs with existing markdown files. +### 12. Additional Benchmarks +- WebArena, OSWorld, MiniWoB++ -**Existing Documentation**: -- `docs/index.md` - Home page -- `docs/architecture.md` - System architecture -- `docs/cli.md` - CLI reference -- `docs/packages/*.md` - Package documentation -- `docs/getting-started/*.md` - Installation, quickstart, permissions - -**Next Actions**: -- [ ] Verify `mkdocs.yml` configuration -- [ ] Run `mkdocs build` and test locally -- [ ] Set up GitHub Actions for auto-deploy to GitHub Pages -- [ ] Configure CNAME for docs.openadapt.ai -- [ ] Add API reference (auto-generated from docstrings) -- [ ] Write getting-started tutorial (5-minute quickstart) - ---- - -## Dependency Graph - -``` -P0: Fix CI (PR #969) ─────────────────────────────────────────────────┐ -P0: Docker Build ─────────────────────────────────────────────────────┤ -P0: Verify Meta-Package ──────────────────────────────────────────────┤ -P0: Basic Workflow ───────────────────────────────────────────────────┤ - │ - v -P1: Baseline Adapters ────────────────────────────────────────────────┤ -P1: Demo Conditioning ────────────────────────────────────────────────┤ -P1: WAA Benchmark ────────────────────────────────────────────────────┘ - │ - v -P2: Safety Gates ─────────────────────────────────────────────────────┐ -P2: Grounding Improvements ───────────────────────────────────────────┤ -P2: Viewer Dashboard ─────────────────────────────────────────────────┘ - │ - v -P3: Telemetry (GlitchTip) ────────────────────────────────────────────┐ -P3: Additional Benchmarks ────────────────────────────────────────────┤ -P3: Documentation Site ───────────────────────────────────────────────┘ -``` - ---- - -## Technical Debt - -### Known Issues - -| Issue | Severity | Package | Notes | -|-------|----------|---------|-------| -| Python version mismatch | Medium | `openadapt-ml` | Requires 3.12+, others 3.10+ | -| `capture stop` TODO | Low | `openadapt` CLI | Uses Ctrl+C instead of signal/file | -| `release-and-publish.yml` uses hatchling | Low | Main repo | Aligned with meta-package | -| Legacy code | Low | `/legacy/` | Many TODOs, not blocking v1.0 | - -### Code Quality - -| Package | TODOs | Notes | -|---------|-------|-------| -| `openadapt/cli.py` | 1 | Implement stop via signal/file | -| `legacy/` | 100+ | Historical, not blocking v1.0 | +### 13. Documentation Site +- Deploy MkDocs to docs.openadapt.ai --- ## Success Criteria -### P0 Complete (This Week) - -- [ ] CI passes on all matrix combinations (Python 3.10/3.11/3.12, macOS/Ubuntu) -- [ ] PR #969 merged -- [ ] Docker build succeeds for OmniParser -- [ ] `pip install openadapt[core]` works on Python 3.12 -- [ ] Basic capture/eval workflow demonstrated - -### P1 Complete (1-2 Weeks) - -- [ ] API agents (Claude, GPT-4V) working with demo conditioning -- [ ] WAA baseline established with metrics -- [ ] First-action accuracy validated (33% -> 100% with demo) - -### P2 Complete (This Month) - -- [ ] Safety gates implemented and documented -- [ ] Grounding improving action accuracy -- [ ] Viewer dashboard with video playback - -### P3 Complete (Backlog) - -- [ ] Telemetry package published -- [ ] docs.openadapt.ai live -- [ ] Additional benchmarks integrated +| Phase | Criteria | +|-------|----------| +| P0 | CI green, Docker builds, `openadapt[core]` installs, basic workflow works | +| P1 | API agents with demo conditioning, WAA baseline established | +| P2 | Safety gates, grounding integration, viewer with video | +| P3 | Telemetry published, docs live, additional benchmarks | --- -## Resources Required +## Technical Debt -| Resource | Purpose | Status | -|----------|---------|--------| -| Azure credits | WAA benchmark VM | Available | -| Anthropic API key | Claude testing | Available (in openadapt-ml/.env) | -| OpenAI API key | GPT-4V testing | Available (in openadapt-ml/.env) | -| Google API key | Gemini testing | Available (in openadapt-ml/.env) | -| Test machines | Windows 10/11, Ubuntu 22.04/24.04 | Provision using existing tooling | -| DNS access | docs.openadapt.ai CNAME | Done | +- `openadapt-ml` requires 3.12+, others 3.10+ (version mismatch) +- `capture stop` uses Ctrl+C instead of signal/file +- Legacy code in `/legacy/` (not blocking v1.0) --- -## Appendix: Quick Reference - -### PyPI Package URLs - -- https://pypi.org/project/openadapt/ -- https://pypi.org/project/openadapt-capture/ -- https://pypi.org/project/openadapt-ml/ -- https://pypi.org/project/openadapt-evals/ -- https://pypi.org/project/openadapt-viewer/ -- https://pypi.org/project/openadapt-grounding/ -- https://pypi.org/project/openadapt-retrieval/ -- https://pypi.org/project/openadapt-privacy/ - -### GitHub Repositories - -- Main: https://github.com/OpenAdaptAI/openadapt -- Sub-packages: https://github.com/OpenAdaptAI/openadapt-{capture,ml,evals,viewer,grounding,retrieval,privacy} - -### Related Documents - -- Architecture: `/docs/architecture.md` -- Telemetry Design: `/docs/design/telemetry-design.md` -- Landing Page Strategy: `/docs/design/landing-page-strategy.md` -- Legacy Freeze: `/docs/legacy/freeze.md` - ---- +## Resources -*This roadmap is a living document. Update as priorities shift based on user feedback and technical discoveries.* +| Resource | Status | +|----------|--------| +| Azure credits (WAA) | Available | +| API keys (Claude, GPT-4V, Gemini) | In `openadapt-ml/.env` | +| DNS (docs.openadapt.ai) | Configured | From 0461e6afb24f3b2ccdd1ce21e0e3434695d171fb Mon Sep 17 00:00:00 2001 From: Richard Abrich Date: Sat, 17 Jan 2026 01:43:52 -0500 Subject: [PATCH 2/2] fix: correct accuracy claim to 46.7% -> 100% --- docs/roadmap-priorities.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/roadmap-priorities.md b/docs/roadmap-priorities.md index 68d559aea..16e4303a4 100644 --- a/docs/roadmap-priorities.md +++ b/docs/roadmap-priorities.md @@ -59,7 +59,7 @@ openadapt eval run --checkpoint model.pt - Add rate limit / API failure handling ### 6. Demo Conditioning in Evals -- **Key result**: 33% -> 100% first-action accuracy with demo conditioning +- **Key result**: 46.7% -> 100% first-action accuracy with demo conditioning (on shared-entry-point benchmark) - Integrate `openadapt-retrieval` with `openadapt-ml` adapters - Add `--demo` flag to `openadapt eval run`