[copilot-cli-research] Copilot CLI Deep Research - 2026-02-23 #17985

2026-02-23T21:35:49Z

github-actions[bot]
bot Feb 23, 2026

Executive Summary

Analysis Date: February 23, 2026 | Repository: github/gh-aw | Triggered by: @pelikhan

This deep research report analyzes 76 Copilot CLI workflows out of 158 total agentic workflows (48%) in this repository. The analysis reveals significant optimization opportunities — particularly around security (AWF sandbox adoption), capability gaps (engine.env, plugins, safe-inputs), and model cost optimization.

Key Discoveries: The Copilot CLI has extensive features that are largely untapped. The most critical finding is that 22 workflows use network configuration but lack AWF firewall protection, and engine.env — a fully documented feature — has never been used across any of the 158 workflows.

Primary Recommendation: Enable AWF sandbox for all workflows that access external network resources. This is a security improvement with minimal friction.

Critical Findings

🔴 High Priority

Finding	Affected Workflows	Impact
22 Copilot workflows use `network:` config without AWF sandbox	`auto-triage-issues`, `copilot-pr-merged-report`, `daily-workflow-updater`, and 19 more	Security: uncontrolled outbound network access
5 workflows use `web-fetch`/`playwright` without sandbox	`cli-consistency-checker`, `craft`, `docs-noob-tester`, `slide-deck-maintainer`, `weekly-editors-health-check`	Security: external HTTP without firewall
`safe-inputs` used in only 1 workflow	Only `security-review.md`	Security: user-controlled inputs not sanitized

🟡 Medium Priority

Finding	Impact
`engine.env` never used (0/158 workflows)	Missing per-workflow environment customization
`plugins` feature never used (0 workflows)	Unutilized engine capability
27 Copilot workflows lack `strict: true`	Reduced reliability for production workflows
48/76 Copilot workflows have no `cache-memory`	Repeated expensive operations on daily workflows
No Copilot version pinning (all use default v0.0.414)	Unexpected breaks when version auto-updates

1️⃣ Current State Analysis

View Copilot CLI Capabilities Inventory

CLI Flags Generated Automatically

--add-dir — adds directories for agent file access (generated automatically)
--disable-builtin-mcps — always added to prevent conflicts
--allow-tool — per-tool permissions (computed from frontmatter)
--allow-all-tools — when bash: [":*"] is used
--allow-all-paths — when edit: tool is enabled
--log-level all + --log-dir — always added for observability
--agent — set via engine.agent field
--model / COPILOT_MODEL — set via engine.model field (or via GitHub variable GH_AW_MODEL_AGENT_COPILOT)
--prompt — always set to the compiled prompt file

Engine Configuration Fields (Frontmatter)

engine:
  id: copilot
  version: latest         # Pin to a specific version (e.g., "0.0.414")
  model: gpt-5            # Override model (e.g., gpt-5.1-codex-mini for cost savings)
  command: /path/binary   # Custom executable path
  args: ["--verbose"]     # Extra CLI args injected before --prompt
  agent: agent-name       # Reference .github/agents/*.agent.md
  env:                    # Custom env vars passed to the CLI
    MY_VAR: "value"
```

#### Supported Tools
- `github:` — GitHub MCP server (used by 115/158 workflows)
- `web-fetch:` — Built-in URL fetching (16 workflows)
- `playwright:` — Browser automation (12 workflows)
- `serena:` — Code intelligence MCP (6 workflows)
- `cache-memory:` — Cross-run persistent storage (67 workflows)
- `bash:` — Shell execution (many workflows)
- `edit:` — File write access (10+ Copilot workflows)
- `agentic-workflows:` — Launch nested workflows (9 Copilot workflows)
- `repo-memory:` — Git-based memory (8 Copilot workflows)

#### Sandbox Options
- `sandbox.agent: awf` — AWF network firewall (19 workflows)
- `sandbox.agent: false` — Explicitly disabled (for testing)
- Default — no sandbox (majority of workflows)

#### Available Agent Files (`.github/agents/`)
- `agentic-workflows` — General workflow agent
- `ci-cleaner` — CI maintenance specialist
- `technical-doc-writer` — Documentation writer
- `contribution-checker` — PR compliance checker
- `create-safe-output-type` — Safe output specialist
- `custom-engine-implementation` — Engine dev specialist
- `grumpy-reviewer` — Critical code reviewer
- `interactive-agent-designer` — Workflow designer
- `w3c-specification-writer` — Spec writer

</details>

<details>
<summary><b>View Usage Statistics</b></summary>

#### Workflow Distribution
```
Total workflows:         158
Copilot engine:           76  (48%)
Other engines:            82  (52%)
```

#### Feature Adoption (all workflows)
| Feature | Count | % of All | % of Copilot |
|---------|-------|----------|--------------|
| `safe-outputs` | 150 | 95% | ~100% |
| `timeout-minutes` | 150 | 95% | 99% |
| `strict:` mode | 104 | 66% | 64% |
| `github:` tool | 115 | 73% | —  |
| `cache-memory` | 67 | 42% | 37% |
| `network:` config | 72 | 46% | 29% |
| `web-fetch:` | 16 | 10% | 11% |
| `playwright:` | 12 | 8% | — |
| `serena:` | 6 | 4% | — |
| AWF `sandbox:` | 19 | 12% | 25% |
| `engine.agent` | 16 | 10% | — |
| `engine.model` | 7 | 4% | — |
| `safe-inputs:` | 1 | 0.6% | 1.3% |
| `engine.env` | **0** | **0%** | **0%** |
| `plugins` | **0** | **0%** | **0%** |
| `runtime-import` | **0** | **0%** | **0%** |

#### Top Timeout Values
```
30 min — 34 workflows  (most common)
20 min — 30 workflows
15 min — 29 workflows
10 min — 26 workflows
45 min — 14 workflows

GitHub Toolset Adoption

Most workflows use [default]. Workflows with specific-only toolsets include: code-scanning-fixer (context, repos, code_security, pull_requests), auto-triage-issues (issues), daily-assign-issue-to-user (issues, pull_requests, repos).

2️⃣ Feature Usage Matrix

Feature Category	Available Features	Used	Not Used	Usage Rate
CLI Flags	--add-dir, --agent, --model, --allow-tool, --disable-builtin-mcps, --allow-all-paths	All auto-generated	none	100% (auto)
Engine Config	id, version, model, command, args, agent, env	model, agent, args	version, command, env	43%
MCP / Tools	github, web-fetch, playwright, serena, bash, edit, agentic-workflows, repo-memory, cache-memory	All except some	—	~85%
Network Config	allowed, defaults, github, node, python, go, playwright	Used widely	specific domain tuning rare	~70%
Sandbox Options	AWF, disabled	AWF in 25% of copilot	Most copilot workflows unprotected	25%
Security	strict, safe-inputs, sandbox	strict widely, sandbox partial	safe-inputs	33-95%
Agent Files	9 files available	2 in use	7 unused agent files	22%
Plugins	Plugin system supported	0 workflows	all	0%

3️⃣ Missed Opportunities

View High Priority Opportunities

🔴 Opportunity 1: Enable AWF Sandbox for Network-Using Copilot Workflows

What: 22 Copilot workflows configure network.allowed but run without the AWF firewall sandbox.

Why It Matters: Without AWF, the network.allowed configuration has no enforcement — the Copilot CLI can access any external host. AWF actually enforces the allowlist.

Where: auto-triage-issues, copilot-pr-merged-report, daily-workflow-updater, delight, dictation-prompt, discussion-task-miner, docs-noob-tester, layout-spec-maintainer, org-health-report, portfolio-analyst, slide-deck-maintainer, smoke-multi-pr, smoke-temporary-id, smoke-test-tools, stale-repo-identifier, sub-issue-closer, tidy, ubuntu-image-analyzer, weekly-editors-health-check, plus 3 more.

How to Implement:

# Add to frontmatter alongside existing network: config
sandbox:
  agent: awf

Expected Benefits: Actual enforcement of network restrictions, reduced attack surface, compliance with security best practices.

🔴 Opportunity 2: Add AWF Sandbox for web-fetch/playwright Workflows

What: 5 workflows use web-fetch: or playwright: to access external URLs but lack AWF protection.

Affected Workflows: cli-consistency-checker, craft, docs-noob-tester, slide-deck-maintainer, weekly-editors-health-check

Example fix for craft.md:

# Current state
network:
  allowed: [defaults, node]
tools:
  web-fetch:

# Recommended
network:
  allowed: [defaults, node]
sandbox:
  agent: awf
tools:
  web-fetch:

🔴 Opportunity 3: Expand safe-inputs Usage

What: safe-inputs sanitizes user-provided content (issue bodies, PR descriptions, comments) before it reaches the AI agent. Currently only used in security-review.md.

Why It Matters: Prompt injection attacks via GitHub issue bodies/comments are a real threat. Safe inputs prevents malicious content from hijacking agent behavior.

Where: Any workflow triggered by issues, pull_request, issue_comment, or discussion events with user-controlled text.

How to Implement:

# Workflows triggered by user events
safe-inputs:
  sanitize: true

# Then use $\{\{ steps.sanitized.outputs.text }} instead of raw inputs

High-Value Targets: auto-triage-issues, grumpy-reviewer, pr-nitpick-reviewer, code-scanning-fixer, contribution-check

View Medium Priority Opportunities

🟡 Opportunity 4: Use engine.env for Workflow-Specific Configuration

What: engine.env allows passing custom environment variables to the Copilot CLI without modifying the prompt. Currently completely unused across all 158 workflows.

Why It Matters: Instead of hardcoding configuration in prompts, environment variables allow dynamic configuration without recompilation.

How to Implement:

engine:
  id: copilot
  env:
    TARGET_BRANCH: main
    MAX_ISSUES_TO_PROCESS: "10"
    DEBUG_MODE: "false"

Use Cases:

Parameterize daily workflow behavior (e.g., MAX_ITEMS: "5")
Pass feature flags without prompt changes
Configure tool behavior (e.g., API endpoints)

🟡 Opportunity 5: Pin Copilot Version for Stable Production Workflows

What: All workflows use the default Copilot CLI version (currently v0.0.414). No workflow pins to a specific version.

Why It Matters: Auto-updates can introduce breaking changes. Production workflows like release.md, daily-* should be version-pinned.

How to Implement:

engine:
  id: copilot
  version: "0.0.414"    # Pin to known-good version

High-Value Targets: release.md, daily-compiler-quality.md, daily-testify-uber-super-expert.md, ci-coach.md

🟡 Opportunity 6: Leverage Unused Agent Files

What: 9 specialized agent files exist in .github/agents/ but only 2 (technical-doc-writer, ci-cleaner) are used in production workflows.

Unused agents: agentic-workflows, contribution-checker, create-safe-output-type, custom-engine-implementation, grumpy-reviewer, interactive-agent-designer, w3c-specification-writer

How to Implement:

engine:
  id: copilot
  agent: grumpy-reviewer    # For PR review workflows

Workflow-Agent Pairings:

grumpy-reviewer.md → agent: grumpy-reviewer (currently mismatched — uses same prompt without agent file)
contribution-check.md → agent: contribution-checker
code-scanning-fixer.md → could benefit from specialized coding agent

🟡 Opportunity 7: Add cache-memory to High-Value Daily Workflows

What: 48/76 Copilot workflows lack cache-memory, missing cross-run context persistence. Many daily workflows repeat expensive analysis from scratch.

Highest-Value Additions:

daily-assign-issue-to-user.md — could remember previous assignments
daily-cli-performance.md — track baseline metrics over time
weekly-issue-summary.md — accumulate weekly context
auto-triage-issues.md — learn from past triage decisions

How to Implement:

tools:
  cache-memory: true    # Default cache
  # OR
  cache-memory:
    - id: analysis-history
      expires: 7d

🟡 Opportunity 8: Optimize Model Selection for Cost-Effective Workflows

What: Only 7 workflows pin a specific model. Most default to the system default. Many simple workflows could use lighter models (e.g., gpt-5.1-codex-mini) for cost savings.

How to Implement:

# For simple, well-defined tasks
engine:
  id: copilot
  model: gpt-5.1-codex-mini  # Fast, cost-effective

# Or via GitHub variable (affects all copilot workflows)
# Set GH_AW_MODEL_AGENT_COPILOT repository variable

High-Value Targets (simple/repetitive tasks that don't need premium models):

daily-fact.md (already uses gpt-5.1-codex-mini ✅)
daily-assign-issue-to-user.md
draft-pr-cleanup.md
changeset.md (already uses gpt-5.1-codex-mini ✅)

View Low Priority Opportunities

🟢 Opportunity 9: Enable Plugins for Extended Capabilities

What: The Copilot engine supports plugin installation (supportsPlugins: true) but zero workflows use it.

Why It Matters: Plugins can extend Copilot CLI with custom tools and integrations beyond what MCP servers provide.

Note: This requires Copilot CLI plugins to be available and relevant. As plugins become available for the use cases in this repo, consider using them.

🟢 Opportunity 10: Enable strict: true on Missing Workflows

What: 27 Copilot workflows lack strict: true mode. Strict mode enforces correct output structure validation.

Workflows Missing strict: true (sample): agent-performance-analyzer, archie, bot-detection, brave, breaking-change-checker, dev, firewall-escape, jsweep, mcp-inspector, pdf-summary, research, and ~17 more.

How to Implement: Simply add strict: true to frontmatter.

🟢 Opportunity 11: Use runtime-import for Dynamic Workflows

What: runtime-import allows workflow prompts to be updated from a URL at runtime without recompilation. Currently used in 0 workflows.

Use Case: Workflows where the prompt content needs frequent updates (e.g., FAQs, policy checks that change often).

4️⃣ Specific Workflow Recommendations

View Workflow-Specific Recommendations

`grumpy-reviewer.md`

Current State: Uses cache-memory ✅, has strict mode ✅, no engine.agent
Recommendation: Add engine: { agent: grumpy-reviewer } to use the dedicated agent file
Expected Benefit: More consistent, specialized review behavior

`contribution-check.md`

Current State: Simple contribution checker without agent file
Recommendation: engine: { agent: contribution-checker }
Expected Benefit: Leverage the specialized contribution-checker agent file

`daily-workflow-updater.md`

Current State: Uses network config, no sandbox
Recommendation: Add sandbox: { agent: awf } to enforce network restrictions
Expected Benefit: Network access actually constrained to configured allowlist

`org-health-report.md`

Current State: Network access + Python runtime, no sandbox, has cache-memory ✅
Recommendation: Add AWF sandbox + consider engine.model: gpt-5.1-codex-mini for cost savings
Expected Benefit: Security + cost reduction

`auto-triage-issues.md`

Current State: Strict ✅, network config but no sandbox, no cache-memory
Recommendation: Add AWF sandbox + safe-inputs + cache-memory for learned triage patterns
Expected Benefit: Security + improving triage quality over time

`release.md`

Current State: Has AWF sandbox ✅, uses network
Recommendation: Pin engine.version to prevent release disruptions from CLI updates
Expected Benefit: Stability for critical release workflow

`research.md`

Current State: AWF sandbox ✅, Tavily integration, no strict mode
Recommendation: Add strict: true
Expected Benefit: Better output validation

5️⃣ Trends & Insights

View Historical Trends

This is the first comprehensive analysis of Copilot CLI usage patterns in this repository. Future runs will track trends.

Key Observations

The repository is mature with 158 workflows — a large, diverse sample
AWF sandbox adoption has increased (CHANGELOG shows recent AWF improvements: chroot mode, PATH sanitization, binary mounts)
Recent focus on HTTP transport for safe-outputs/safe-inputs MCP servers
The Copilot engine is the primary engine (76/158 = 48%)
Model selection is emerging (7 workflows pin models, mostly gpt-5.1-codex-mini for cost savings)

Pattern: Security-Performance Tradeoff

Many workflows choose not to use AWF sandbox — likely due to setup overhead or concerns about compatibility. The recent AWF improvements (chroot mode, transparent tool access) should reduce these concerns.

Pattern: Over-Reliance on Default Toolset

60+ workflows use toolsets: [default] which includes all common operations. More specific toolsets (e.g., [issues] for issue-only workflows) would reduce the Copilot CLI's tool footprint and improve performance.

6️⃣ Best Practice Guidelines

Based on this research, here are recommended best practices:

Always pair network: config with AWF sandbox: A network.allowed list has no effect without sandbox.agent: awf. These two settings are only meaningful together.
Use specific GitHub toolsets: Instead of always using [default], specify only what the workflow needs (e.g., [issues] for issue triagers, [repos] for code analysis). This reduces agent confusion and improves performance.
Add safe-inputs for user-triggered workflows: Any workflow that acts on user-supplied content (issue body, PR description, comments) should sanitize inputs with safe-inputs: to prevent prompt injection.
Choose models based on task complexity: Use gpt-5.1-codex-mini for simple, well-defined tasks (fact generation, simple triaging, label assignment). Reserve premium models for complex code analysis, writing, and multi-step reasoning.
Enable AWF for all external-network workflows: Both security and observability benefit from AWF — it logs network access, enforces allowlists, and isolates the agent from the host environment.
Use engine.agent to reuse specialized agents: The .github/agents/ directory has 9 specialized agents. Match workflows to the appropriate agent file instead of embedding all context in the prompt.
Pin versions for production/release workflows: Add engine.version to critical workflows (release.md, daily-* production checks) to prevent unexpected breaks from CLI updates.

7️⃣ Action Items

Immediate Actions (this week):

Add sandbox: { agent: awf } to the 5 workflows using web-fetch/playwright without sandbox
Add strict: true to the 10 most-used Copilot workflows missing it

Short-term (this month):

Enable AWF sandbox for remaining 17 network-config copilot workflows
Add safe-inputs: to top 5 user-triggered workflows (auto-triage-issues, grumpy-reviewer, pr-nitpick-reviewer)
Connect grumpy-reviewer.md and contribution-check.md to their matching agent files
Pin engine.version for release.md and other production-critical workflows

Long-term (this quarter):

Establish engine.env usage pattern and document in workflow templates
Add cache-memory to key daily workflows for historical learning
Audit GitHub toolset usage — migrate from [default] to specific toolsets where possible
Create model selection guide: define which tasks warrant which model tier

View Supporting Evidence & Methodology

Research Methodology

Data Sources:

158 workflow .md files in .github/workflows/
Go source code: pkg/workflow/copilot_engine*.go, copilot_mcp.go
Documentation: docs/src/content/docs/reference/engines.md
Constants: pkg/constants/constants.go
CHANGELOG.md for historical feature additions

Tools Used:

grep for pattern searching across all workflow files
Shell scripting to aggregate statistics
Direct source code review of copilot_engine_execution.go (430 lines) for CLI flag generation

Analysis Approach:

Extracted all available features from source code (what can be configured)
Counted actual usage across all 158 workflows
Identified gaps between available and used features
Prioritized by security impact, developer experience, and ease of implementation

Copilot CLI Version: v0.0.414 (current default)

Key Files Reviewed:

pkg/workflow/copilot_engine.go — Engine definition and capabilities
pkg/workflow/copilot_engine_execution.go — CLI flag generation
pkg/workflow/copilot_engine_tools.go — Tool permission logic
pkg/workflow/copilot_mcp.go — MCP configuration rendering

References:

§22325430057 — Workflow run that generated this report
Copilot Engine Documentation
AWF Sandbox Changes in CHANGELOG

AI generated by Copilot CLI Deep Research Agent

expires on Feb 24, 2026, 9:35 PM UTC

2026-02-24T22:52:05Z

github-actions[bot]
bot Feb 24, 2026
Author

This discussion was automatically closed because it expired on 2026-02-24T21:35:48.399Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[copilot-cli-research] Copilot CLI Deep Research - 2026-02-23 #17985

Uh oh!

{{title}}

Uh oh!

CLI Flags Generated Automatically

Engine Configuration Fields (Frontmatter)

GitHub Toolset Adoption

🔴 Opportunity 1: Enable AWF Sandbox for Network-Using Copilot Workflows

🔴 Opportunity 2: Add AWF Sandbox for web-fetch/playwright Workflows

🔴 Opportunity 3: Expand safe-inputs Usage

🟡 Opportunity 4: Use engine.env for Workflow-Specific Configuration

🟡 Opportunity 5: Pin Copilot Version for Stable Production Workflows

🟡 Opportunity 6: Leverage Unused Agent Files

🟡 Opportunity 7: Add cache-memory to High-Value Daily Workflows

🟡 Opportunity 8: Optimize Model Selection for Cost-Effective Workflows

🟢 Opportunity 9: Enable Plugins for Extended Capabilities

🟢 Opportunity 10: Enable strict: true on Missing Workflows

🟢 Opportunity 11: Use runtime-import for Dynamic Workflows

`grumpy-reviewer.md`

`contribution-check.md`

`daily-workflow-updater.md`

`org-health-report.md`

`auto-triage-issues.md`

`release.md`

`research.md`

Key Observations

Pattern: Security-Performance Tradeoff

Pattern: Over-Reliance on Default Toolset

Research Methodology

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[copilot-cli-research] Copilot CLI Deep Research - 2026-02-23 #17985

Uh oh!

github-actions[bot] bot Feb 23, 2026

Executive Summary

Critical Findings

🔴 High Priority

🟡 Medium Priority

1️⃣ Current State Analysis

CLI Flags Generated Automatically

Engine Configuration Fields (Frontmatter)

GitHub Toolset Adoption

2️⃣ Feature Usage Matrix

3️⃣ Missed Opportunities

🔴 Opportunity 1: Enable AWF Sandbox for Network-Using Copilot Workflows

🔴 Opportunity 2: Add AWF Sandbox for web-fetch/playwright Workflows

🔴 Opportunity 3: Expand safe-inputs Usage

🟡 Opportunity 4: Use engine.env for Workflow-Specific Configuration

🟡 Opportunity 5: Pin Copilot Version for Stable Production Workflows

🟡 Opportunity 6: Leverage Unused Agent Files

🟡 Opportunity 7: Add cache-memory to High-Value Daily Workflows

🟡 Opportunity 8: Optimize Model Selection for Cost-Effective Workflows

🟢 Opportunity 9: Enable Plugins for Extended Capabilities

🟢 Opportunity 10: Enable strict: true on Missing Workflows

🟢 Opportunity 11: Use runtime-import for Dynamic Workflows

4️⃣ Specific Workflow Recommendations

grumpy-reviewer.md

contribution-check.md

daily-workflow-updater.md

org-health-report.md

auto-triage-issues.md

release.md

research.md

5️⃣ Trends & Insights

Key Observations

Pattern: Security-Performance Tradeoff

Pattern: Over-Reliance on Default Toolset

6️⃣ Best Practice Guidelines

7️⃣ Action Items

Research Methodology

Replies: 1 comment

Uh oh!

github-actions[bot] bot Feb 24, 2026 Author

github-actions[bot]
bot Feb 23, 2026

`grumpy-reviewer.md`

`contribution-check.md`

`daily-workflow-updater.md`

`org-health-report.md`

`auto-triage-issues.md`

`release.md`

`research.md`

github-actions[bot]
bot Feb 24, 2026
Author