feat(skill): BM25-based smart skill retrieval for large catalogues#726
feat(skill): BM25-based smart skill retrieval for large catalogues#726claudianus wants to merge 3 commits into
Conversation
When >80 skills are installed, the system prompt switches from a full listing to a compact name-only format. The model discovers skills via the Skill tool's new action:"search" endpoint backed by a BM25 index. Three tiers by skill count: ≤ 80: legacy full listing (prompt-cache optimal) 81-300: compact name + description + search > 300: names only + search required Key changes: - skill/search.ts: BM25 index with synonym expansion (zero deps) - skill/registry.ts: auto-detect tier, lazy content loading - skill/parser.ts: parseSkillMetaFromFile (frontmatter-only) - skill-tool.ts: search action on existing Skill tool - system.md: search-first workflow instructions Performance (measured with 1,530 real skills): - System prompt: 118K → 8.4K tokens (93% reduction) - Startup memory: 88MB → 4MB (95% reduction) - Search latency: 0.0-0.2ms per query - Lazy content load: 0.4ms per skill activation Closes MoonshotAI#725
🦋 Changeset detectedLatest commit: b1136f9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4eca85320e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| export const SkillToolInputSchema: z.ZodType<SkillToolInput> = z.object({ | ||
| skill: z.string(), |
There was a problem hiding this comment.
Allow search calls without a skill name
When the model follows the new instructions/example and calls Skill as {"action":"search","query":"e2e test"}, preflight validation rejects it before execution() runs because this schema still makes skill required; loop/tool-call.ts validates the advertised JSON schema with AJV. This breaks the large-catalog search-first flow unless the model invents a dummy skill name, so skill should be optional for search or the schema should be discriminated by action.
Useful? React with 👍 / 👎.
| import { join } from 'node:path'; | ||
| import { performance } from 'node:perf_hooks'; | ||
|
|
||
| const SKILLS_DIR = join(homedir(), '.kimi', 'skills'); |
There was a problem hiding this comment.
Avoid depending on a local home skill catalog
These tests depend on ~/.kimi/skills existing and containing the author's real 1,530-skill catalog. In a clean CI or developer checkout this path is usually absent or has different contents, so the registry loads no/different skills and the listing assertions or results[0]! dereference fail. Build a temporary fixture catalog inside the test instead of reading the user's home directory.
Useful? React with 👍 / 👎.
| fencesFound++; | ||
| if (fencesFound === 2) break; | ||
| } | ||
| offset += line.length + 1; |
There was a problem hiding this comment.
Preserve CRLF frontmatter fences when slicing
For SKILL.md files written with CRLF line endings, split(/\r?\n/) removes a two-character newline but this offset only adds + 1, so frontmatterOnly is sliced before the second ---; parseSkillText then reports a missing closing fence and discovery skips the skill. This regresses Windows-authored skills that the full parser already accepts, so compute the fence position on the original buffer or account for \r\n lengths.
Useful? React with 👍 / 👎.
P1: Make skill field optional in SkillToolInputSchema — search-only
calls no longer need a dummy skill name. Validation added for
load action requiring skill parameter.
P1: Replace home-dir-dependent integration tests with self-contained
temp fixture (350 SKILL.md files in mkdtemp). Tests are now
portable across CI and developer machines.
P2: Fix CRLF fence offset in parseSkillMetaFromFile — split on \n
and strip trailing \r to correctly account for 2-byte newlines
when computing the slice boundary.
Also: update skill-tool.test.ts contract to reflect optional skill.
Six smoke tests with real fixture files (350 SKILL.md): 1. Tier auto-detection (names-only for 350 skills) 2. Lazy content loading (sentinel → readFileSync) 3. BM25 search accuracy (8/8 domain queries correct) 4. Schema validation (search without skill name) 5. CRLF frontmatter parsing 6. Full model flow: search → pick → load → render
Related Issue
Resolve #725
Problem
When a user has a large number of skills installed (e.g. 1,000+), the current implementation injects the entire skill listing into the system prompt — including name, description (250 chars),
whenToUse, and file path for every skill.With 1,530 real skills this amounts to ~118,000 tokens in the system prompt, which:
What Changed
A 3-tier auto-detection system based on invocable skill count. No feature flag needed — the tier is selected automatically at session start.
New files
skill/search.ts— BM25 search index with synonym expansion. Zero external dependencies. ~24ms to build for 1,500 skills, <1ms per query.Modified files
skill/registry.ts— Auto-detect tier based on skill count.searchSkills()method with lazy index rebuild.renderSkillPrompt()lazy-loads SKILL.md content from disk only when a skill is activated.skill/parser.ts—parseSkillMetaFromFile()reads only YAML frontmatter (~20 lines) instead of the full SKILL.md body. UsesLAZY_CONTENT_SENTINELto distinguish "not loaded" from "genuinely empty body".skill-tool.ts/.md—Skilltool extended withaction: "search"alongside existingaction: "load". Backward-compatible —skillfield remains required.profile/default/system.md— Two-step skill usage: search first, then load.session/index.ts— Minor cleanup (removed flag wiring).flags/registry.ts— No new flags needed.Tests
registry.test.ts(search, tiers, lazy rebuild)integration-proof.test.tsvalidates what the LLM actually sees with real 1,530 skillsscanner.test.tsupdated for lazy loading viarenderSkillPromptPerformance (measured with 1,530 real skills from antigravity-awesome-skills)
Checklist