A Claude Code plugin that provides expert UX evaluation for command-line interfaces, developer tools, and APIs.
Install via the Claude Code plugin system (/plugin install cli-ux-tester@ali5ter).
- 11-criteria UX framework with 1-5 scoring per dimension (8 core + 3 extended criteria)
- Active testing by executing real commands and capturing output
- Parallel evaluation agents for thorough, unbiased analysis
- Persistent memory across evaluations for cross-project pattern tracking
- Comprehensive output artifacts: evaluation report, remediation plan, metrics, and test scripts
- Language-agnostic: evaluates user-facing behavior regardless of implementation
agents/
cli-ux-tester.md # Agent definition β synthesizes results into scored artifacts
skills/
cli-ux-tester/
SKILL.md # Skill β detects CLI, spawns evaluation agents, invokes synthesizer
testing-checklist.md # Comprehensive testing checklist (11 criteria)
test-scenarios.md # Common CLI testing scenarios
scripts/
example-test.sh # Template for automated testing
.claude-plugin/
plugin.json # Plugin manifest
migrate # Migration script for v1.x and v2.x users
README.md
LICENSE
Inside Claude Code, run:
/plugin marketplace add ali5ter/claude-plugins
/plugin install cli-ux-tester@ali5ter
If you previously installed via ./install.sh or an earlier version of this plugin, run the migration script:
./migrateThen reinstall via the plugin commands above.
After installation, ask Claude to evaluate any CLI in your session:
Review this CLI for UX issues
Test the error messages in this tool
Check if this API is developer-friendly
Evaluate the help system
The skill detects which CLI to evaluate from the current directory or your message, then runs the evaluation automatically.
The plugin applies an 11-criteria framework, rating each dimension 1β5 with specific evidence:
Core criteria (1β8):
- Discovery & Discoverability β Can users find features?
- Command & API Naming β Are names intuitive and consistent?
- Error Handling & Messages β Are errors clear and actionable?
- Help System & Documentation β Is help comprehensive and accessible?
- Consistency & Patterns β Do similar operations follow patterns?
- Visual Design & Output β Is output readable and well-formatted?
- Performance & Responsiveness β Does the CLI feel fast?
- Accessibility & Inclusivity β Can diverse developers use it?
Extended criteria (9β11):
- Integration & Interoperability β Does it compose with shell pipelines and standard tools?
- Security & Safety β Are destructive operations guarded and credentials handled safely?
- User Guidance & Onboarding β Does it guide new users toward their first success?
All results go into a timestamped directory in the evaluated project:
CLI_UX_EVALUATION_<YYYYMMDD_HHMMSS>/
βββ EVALUATION.md # Full report with scores and evidence
βββ REMEDIATION_PLAN.md # Prioritized action items with effort estimates
βββ metrics.json # Machine-readable scores for tracking over time
βββ test.sh # Automated regression test script
Clean up with: rm -rf CLI_UX_EVALUATION_*/
In scope (UX/DX):
- User-facing behavior: help text, error messages, output formatting
- Developer experience: discoverability, learnability, consistency
- Accessibility and inclusivity
- Exit codes and signal handling as they affect UX
Out of scope (code quality):
- Internal code architecture or style
- Language-specific best practices unrelated to UX
- Performance internals (though responsiveness is evaluated)
The plugin provides two components:
- Skill (
cli-ux-tester) β detects the target CLI, asks clarifying questions if needed, spawns three evaluation agents in parallel (an Explore agent for codebase mapping and two test agents for help/discovery and error handling), then passes all collected results to the synthesizer agent - Agent (
cli-ux-tester:cli-ux-tester) β receives pre-collected test data and synthesizes it into a scored 11-criteria evaluation, producing all four output artifacts
The skill handles parallel evaluation directly because the platform does not support sub-agents spawning
further sub-agents. The agent runs in acceptEdits permission mode to auto-approve artifact writes, and
uses persistent user-scoped memory to accumulate cross-evaluation patterns over time.
- The evaluation agents execute commands in the current directory to observe real behavior.
- All generated files use a timestamped directory for easy cleanup.
- The synthesizer agent uses
permissionMode: acceptEditsβ file writes are auto-approved, butBashcommands still prompt for permission.
MIT License, Copyright (c) 2026 Alister Lewis-Bowen.