feat(concepts): introduce CONCEPTS.md as shared vocabulary substrate by tmchow · Pull Request #838 · EveryInc/compound-engineering-plugin

tmchow · 2026-05-16T07:50:05Z

What this PR does

Adds CONCEPTS.md — a small, repo-root glossary of the words this codebase uses in a specific way (domain entities, named processes, status concepts where two engineers might give different definitions if you didn't pin them down) — and wires it into the plugin's skills so it accretes from real work rather than living as a separate documentation project.

Why it matters

CONCEPTS.md is a compounding-knowledge play in three dimensions.

Across human-agent dialogue. Humans and agents share one stable map of what domain terms mean. The human can use casual language ("buyer") and the agent maps it to the canonical ("Customer") because the project has settled on that name. The agent uses canonical names in its responses, surfacing misalignments fast rather than letting them compound silently — and quietly teaching the human the project's vocabulary over time. The win isn't just clearer terminology; it's shared terminology, lowering the cost of every dialogue.

Across multi-skill workflows. When ce-brainstorm hands off to ce-plan, plan to work, work to ce-code-review and back to ce-compound, each agent reads the same vocabulary anchor — so the output of one phase doesn't get mistranslated by the next. Plans use the entities the domain actually has; code can reflect names the team has agreed on; research is more precise because terms have stable meaning. The substrate that docs/solutions/ and AGENTS.md cite is the same one downstream skills produce against.

Across time and sessions. This is what makes CONCEPTS.md fit the plugin's central thesis. Each conversation, each learning, each plan pays a small cost to clarify terms; CONCEPTS.md captures the result so the next round inherits it. The first time the team settles "Customer" vs. "User," the cost is paid once. The vocabulary survives fresh agent sessions, context compaction, and model upgrades. The file accretes from real engineering activity and self-corrects when drift accumulates, so the substrate improves rather than ossifying. The caveat: value scales with population — a fresh repo pays a latency cost before the file is mature enough to deliver this. But the accretion model means it grows automatically with normal work, not as something anyone has to staff.

How it works

Creation is concentrated. Only ce-compound and ce-compound-refresh create CONCEPTS.md. Both bootstrap lazily — when at least one qualifying term surfaces — and both write a visible preamble at the top of the file that teaches the artifact's role to anyone (human or agent) who opens it. Both also hold the qualifying bar conservatively at the moment of creation, deferring borderline terms to a later run rather than seeding a thin file from one weak signal.

Contributors don't create. ce-brainstorm and ce-plan add to or refine the file when terms resolve during dialogue or planning, but skip writes entirely when the file doesn't exist. Brainstorm-only vocabulary stays out of the file until implementation lands.

Readers ground in it. ce-learnings-researcher reads CONCEPTS.md before keyword extraction. ce-brainstorm Phase 1.1 reads it as the project's authoritative vocabulary, mapping user-offered synonyms to canonical names across dialogue, approaches, and the requirements doc. ce-plan does the same for plans.

Vocabulary capture is source-agnostic. Phase 2.4 reads the new doc and the surrounding conversation — both always available to the orchestrating agent. Other research inputs (ce-sessions findings, future external sources) flow through the writer's synthesis into the doc rather than being scanned directly, keeping the capture step decoupled from input plumbing.

Self-correcting. ce-compound opportunistically scrubs violations in entries it touches; ce-compound-refresh runs a broader sweep as part of Phase 4.5 because audit is its job. The refresh summary reports a scrubbed count alongside added and refined.

Cold-start requests are handled. When a user types "create my CONCEPTS.md" without an existing learning corpus, ce-compound's description routes the request to a short intercept that explains the accretion model and redirects, rather than ad-hoc-creating a thin file.

Quality discipline. The format rules (concepts-vocabulary.md, duplicated across the two creator skills) lead with "Be opinionated" — pick the canonical term, retire synonyms as aliases — and codify "the file stands on its own": each entry teaches its concept without requiring the codebase, PR history, or external context. Aliases ride per-entry; resolved ambiguities live in a tail audit section.

Files changed

File	Change
`ce-compound/SKILL.md`	CONCEPTS.md bootstrap-request intercept; Phase 2.4 Vocabulary Capture (scans new doc + surrounding conversation) with opportunistic self-correction, conservative creation bar, preamble-on-bootstrap; Discoverability Check step; CONCEPTS lines in output reports
`ce-compound/references/concepts-vocabulary.md`	Format rules: be-opinionated, stands-on-its-own, aliases-per-entry, one-sentence base definition, optional Relationships, formalized Flagged ambiguities tail, illustrative example
`ce-compound-refresh/SKILL.md`	Vocabulary as Phase 1 dimension; Phase 4.5 with conservative creation bar, scrub-violations sweep, preamble-on-bootstrap; Discoverability Check step; CONCEPTS line in refresh summary with added/refined/scrubbed counts
`ce-compound-refresh/references/concepts-vocabulary.md`	Duplicate of `ce-compound`'s copy (no cross-skill references per plugin AGENTS.md)
`agents/ce-learnings-researcher.agent.md`	Grounds in `CONCEPTS.md` before keyword extraction
`agents/ce-coherence-reviewer.agent.md`, `agents/ce-web-researcher.agent.md`	Minor adjustments
`ce-brainstorm/SKILL.md`	Phase 1.1 reads `CONCEPTS.md` as project-authoritative vocabulary; Phase 1.4 contributor-only vocabulary capture with glossary-only boundary
`ce-plan/SKILL.md`	Phase 5.1 reads `CONCEPTS.md`; Phase 5 gap-fill rule with glossary-only boundary
`ce-sessions/evals/`	New terminology-preservation eval suite for ce-sessions itself (not load-bearing for this PR — built during design exploration; kept as future infrastructure for validating ce-sessions behavior as the skill evolves)
`plugins/compound-engineering/AGENTS.md`	Contributor note: keep the two `concepts-vocabulary.md` reference copies in sync
`tests/frontmatter.test.ts`	Test cleanup

Test plan

bun test passing
bun run release:validate clean
Desk review via the skill-creator skill, with feedback applied on over-prescription and drift mitigation
ce-sessions terminology-preservation eval suite run (iteration-1: 100% must-tier recall, 0% stddev across 4 evals × 3 runs; result not load-bearing for this PR but confirms ce-sessions's capability)
Live dogfood: run ce-compound on a real learning in a repo with domain vocabulary, confirm CONCEPTS.md gets created with sensible entries and AGENTS.md gets the discoverability mention
Live dogfood: run ce-compound-refresh on a corpus and confirm bootstrap aggregation works
Live dogfood: run ce-brainstorm and confirm it contributes when the file exists, skips when it doesn't
Live dogfood: type "create my CONCEPTS.md" without an existing learning and confirm the intercept routes correctly

Adds a domain-vocabulary artifact maintained as a side effect of compounding. CONCEPTS.md is the substrate that learnings cite — entities, named processes, and status concepts with project-specific precise meaning. Lazy creation, opportunistic AGENTS.md discoverability, no user prompts. Ownership model: - ce-compound and ce-compound-refresh create and maintain the file. Both also surface CONCEPTS.md to AGENTS.md/CLAUDE.md on first creation via the existing Discoverability Check, so future agents discover the file. - ce-brainstorm and ce-plan are contributors only — they add to or refine CONCEPTS.md when terms surface, but skip writes entirely when the file doesn't exist. Avoids speculative bootstrapping from pre-implementation work. - ce-learnings-researcher reads CONCEPTS.md as grounding before keyword extraction so result distillation uses canonical terminology. ce-compound and ce-compound-refresh both bundle a concepts-vocabulary.md reference with inclusion criteria, format rules, and an illustrative example. ce-brainstorm and ce-plan intentionally do not — they learn format from the existing file's contents. Plugin AGENTS.md gains a note that the two reference copies must stay in sync.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4225fa13d4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-16T07:52:14Z

+
+   **Skip this step entirely if `CONCEPTS.md` does not exist** — never nag for an artifact the project has not adopted. When skipped, this step produces no output and no edit.
+
+6. **Amend or create a follow-up commit when the check produces edits.** If step 4 resulted in an edit to an instruction file and Phase 5 already committed the refresh changes, stage the newly edited file and either amend the existing commit (if still on the same branch and no push has occurred) or create a small follow-up commit (e.g., `docs: add docs/solutions/ discoverability to AGENTS.md`). If Phase 5 already pushed the branch to a remote (e.g., the branch+PR path), push the follow-up commit as well so the open PR includes the discoverability change. This keeps the working tree clean and the remote in sync at the end of the run. If the user chose "Don't commit" in Phase 5, leave the instruction-file edit unstaged alongside the other uncommitted refresh changes — no separate commit logic needed.


Handle CONCEPTS discoverability edits in commit flow

Step 6 only triggers follow-up commit logic when step 4 edited an instruction file, but this commit adds a second edit path in step 5 for CONCEPTS.md discoverability. In repos where docs/solutions/ discoverability already passes (no step-4 edit) but CONCEPTS.md discoverability fails (step-5 edit), the new instruction-file change is not covered by the commit path, which can leave a dirty worktree or omit the intended edit from the refresh commit sequence.

Useful? React with 👍 / 👎.

Good catch. Fixed in 10ddbe5 — step 6 now covers edits from either step 4 or step 5, with example commit messages for both paths and the combined case.

Step 6's amend/follow-up commit logic only mentioned step 4 (docs/solutions discoverability edit). When step 4 produces no edit but step 5 (the new CONCEPTS.md discoverability path) does, the new instruction-file change would be left out of the commit sequence and end up as a dirty worktree or an omitted edit. Cover both edit paths in step 6.

External test surfaced two structural failures in ce-compound that an LLM orchestrator can hit even when following the skill text: 1. ce-sessions return read as a terminus. Phase 1's parallel block ended on three subagents, then ce-sessions ran synchronously as the final input. Phase 2 said "WAIT for all Phase 1 subagents" -- which an LLM could read as not including the skill call. The agent emitted ce-sessions's output to the user and stopped. Fix: add a forward-edge sentence at the end of step 4 ("ce-sessions is the final Phase 1 input, not a workflow stop"), and broaden the Phase 2 WAIT line to "all Phase 1 inputs" with an explicit note that ce-sessions counts despite being a skill rather than a subagent. 2. Phase 2.4's "skip entirely if no terms qualify" let agents vibe-judge "nothing qualifies" from the inline criteria teaser and skip reading references/concepts-vocabulary.md entirely -- the opposite of the stated intent. Fix: invert the phase so "First, read the reference" is the unconditional opener, drop the inline criteria teaser (per the no-duplication-with-references principle), and replace the silent- skip path with a visible "Vocabulary capture: scanned, no qualifying terms" outcome the agent must record. Propagated the Phase 2.4 fix to ce-compound-refresh's Phase 4.5 -- same structural risk, same shared reference, both phases introduced on this branch. Tightened both success-output templates from the ambiguous "skipped (no qualifying terms)" to the unambiguous "scanned, no qualifying terms" so the audit signal cannot be confused with "didn't bother to check".

ce-brainstorm Phase 1.4 and ce-plan §5 gap-fill are contributors to CONCEPTS.md but neither loads concepts-vocabulary.md, so the criteria preventing implementation details from creeping in lived only where the contributors couldn't see them. Add an inline negative-framing line to both ("domain entities, named processes, and status concepts with project-specific meaning only — not file paths, class names, or implementation decisions"). Also drop rationale tails that did not change agent behavior at runtime.

Users may type "create my CONCEPTS.md" without an existing learning corpus, particularly in cold repos. Previously this had no clean routing path — ce-compound's description didn't match the request, so the main agent ad-hoc'd a response. Update ce-compound's description to declare CONCEPTS.md as a stated responsibility, and add a short intercept block near the top of the skill body. The block redirects without performing a bootstrap: explains the accretion model, notes that cold-start codebase scans are intentionally unsupported (the qualifying bar is judgmental), and offers three real next steps — run ce-compound on a real learning, ce-compound-refresh on an existing corpus, or hand-edit directly.

ce-compound Phase 2.4 and ce-compound-refresh Phase 4.5 establish the glossary-only rule for CONCEPTS.md but only apply it prospectively to new entries. Existing drift (file paths, class names, function signatures, status/owner metadata) survived every run. Add active correction at two scopes matched to each skill's character. ce-compound fixes opportunistically — only entries being touched or adjacent to them — because compound is not an audit. ce-compound-refresh runs a full sweep as Phase 4.5 step 6 because refresh is an audit. Extend the refresh report's CONCEPTS.md line to surface the scrubbed count alongside added and refined.

When ce-compound or ce-compound-refresh first creates CONCEPTS.md, write a short preamble at the top explaining what the file is, how it accretes, and what it isn't (glossary only, not a spec or scratchpad). Visible prose under the # Concepts heading so both humans browsing the rendered file and agents reading the raw file see the same framing — an HTML comment would have hidden the model from human readers on GitHub for no real gain.

The "at least one qualifying term" gate in ce-compound Phase 2.4 and ce-compound-refresh Phase 4.5 step 3 could allow a permissive agent to seed CONCEPTS.md from a routine bug fix that only surfaced class or table names dressed up as entities. The criteria in concepts-vocabulary.md are correct but judgmental, and lenience at the creation moment seeds a thin file the team didn't actually need. Add an explicit "hold the qualifying bar conservatively at creation" rule to both skills. Borderline terms defer to a later run with stronger signal. The conservatism is quality, not count — the asymmetric-trap defense against minimum-count gating is preserved. Updates to an existing file continue to follow normal criteria.

After comparing against grill-with-docs (third-party skill for a similar artifact), sharpen how CONCEPTS.md is framed across the plugin and close a terminology-capture gap. In references/concepts-vocabulary.md (both copies): - Lead with "Be opinionated" as the file's stance. - Replace the enumerated "What never appears" list with the principle "The file stands on its own" — one mental test that subsumes the existing exclusions and extends to cases we hadn't enumerated. - Add aliases-per-entry format (*Avoid: X, Y*) so retired synonyms ride alongside their canonical term. - Tighten "Per entry" to one-sentence base definition; explicit second-paragraph allowance for non-obvious behavioral rules only. - Add optional Relationships section when structure is load-bearing. - Rename "Resolved ambiguities" to "Flagged ambiguities." In ce-brainstorm Phase 1.1: reframe CONCEPTS.md as the project's authoritative vocabulary (was: shared domain vocabulary that anchors terms here). Carries authority across the whole session without needing to restate "use canonical names" at every downstream phase. In ce-compound Phase 2.4: extend the vocabulary scan to include ce-sessions findings when Full mode runs. Session findings carry terminology resolution context from prior brainstorm, plan, and work dialogues; without this, that context was being pulled in for research but ignored at capture time. Also replace "scratchpad" with "catch-all" across four locations — clearer naming of the failure mode (dumping ground for things that don't fit elsewhere).

Earlier in this branch, Phase 2.4's vocabulary scan was extended to include ce-sessions findings as a third input. Architectural review surfaced two problems with that wiring: - ce-compound's payload to ce-sessions includes a "directly relevant to this specific problem; ignore unrelated work" filter rule, which actively suppresses the tangential context where vocabulary often lives. The filter is correct for fix-context retrieval but wrong for vocabulary capture — the two needs pull in opposite directions. - Wiring named external sources into Phase 2.4 creates maintenance debt: every new research input (future Slack research, Linear context, etc.) requires updating the scan input list. Revert to scanning only the new doc and the surrounding conversation. Both are always available to the orchestrating agent — no plumbing, no filter-rule mismatch. Conversation catches mid-dialogue vocabulary resolutions that didn't make the doc; the doc captures terms the writer judged worth recording. Terms that emerged only in non-conversation sources (research subagents, ce-sessions) flow into Phase 2.4 indirectly via the doc-writer's synthesis, which is the right level of curation. If external-source vocabulary mining ever becomes a real need, design it as a dedicated dispatch with a vocabulary-tuned payload, not as a Phase 2.4 scan input.

Adds an eval suite that tests whether ce-sessions findings preserve terminology resolution context — specifically, whether distinctive coined terms and their resolution rationale survive the session-historian synthesis step intact. Four test cases with ground truth from recently merged PRs: - synthesis-gate-recovery (PR #822) — distinctive term recovery - mode-headless-semantic-alignment (PR #813) — multi-piece nuance - tangential-term-recovery — indexing-gap test - near-miss-false-positive — discriminating-power test Two-stage grader: programmatic substring match per criticality tier, plus LLM-graded context preservation. Variance protocol: 3 runs per eval. This suite was built during PR #838's design exploration to validate a load-bearing assumption (that ce-sessions findings could feed ce-compound Phase 2.4's vocabulary scan). That assumption was ultimately retired in favor of doc-and-conversation-only scanning, so the suite is not load-bearing for PR #838. Kept as future infrastructure for validating ce-sessions's behavior as the skill evolves — e.g., when changing the session-historian synthesis prompt or adjusting scan-window defaults. Iteration-1 results (executed via skill-creator framework, captured to /tmp/compound-engineering/ce-sessions/evals/iteration-1/) showed ce-sessions preserved terminology strongly across all 4 evals with 100% must-tier recall and 0% stddev — but this is a capability test of the skill in isolation, not a test of any specific integration.

chatgpt-codex-connector Bot reviewed May 16, 2026

View reviewed changes

tmchow added 10 commits May 16, 2026 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(concepts): introduce CONCEPTS.md as shared vocabulary substrate#838

feat(concepts): introduce CONCEPTS.md as shared vocabulary substrate#838
tmchow wants to merge 11 commits into
mainfrom
tmchow/context-md-evaluation

tmchow commented May 16, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Uh oh!

tmchow May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		Skip this step entirely if `CONCEPTS.md` does not exist — never nag for an artifact the project has not adopted. When skipped, this step produces no output and no edit.

		6. Amend or create a follow-up commit when the check produces edits. If step 4 resulted in an edit to an instruction file and Phase 5 already committed the refresh changes, stage the newly edited file and either amend the existing commit (if still on the same branch and no push has occurred) or create a small follow-up commit (e.g., `docs: add docs/solutions/ discoverability to AGENTS.md`). If Phase 5 already pushed the branch to a remote (e.g., the branch+PR path), push the follow-up commit as well so the open PR includes the discoverability change. This keeps the working tree clean and the remote in sync at the end of the run. If the user chose "Don't commit" in Phase 5, leave the instruction-file edit unstaged alongside the other uncommitted refresh changes — no separate commit logic needed.

Conversation

tmchow commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why it matters

How it works

Files changed

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 16, 2026

Choose a reason for hiding this comment

Uh oh!

tmchow May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented May 16, 2026 •

edited

Loading