Skip to content

feat(skill): BM25-based smart skill retrieval for large catalogues#726

Open
claudianus wants to merge 3 commits into
MoonshotAI:mainfrom
claudianus:main
Open

feat(skill): BM25-based smart skill retrieval for large catalogues#726
claudianus wants to merge 3 commits into
MoonshotAI:mainfrom
claudianus:main

Conversation

@claudianus

Copy link
Copy Markdown

Related Issue

Resolve #725

Problem

When a user has a large number of skills installed (e.g. 1,000+), the current implementation injects the entire skill listing into the system prompt — including name, description (250 chars), whenToUse, and file path for every skill.

With 1,530 real skills this amounts to ~118,000 tokens in the system prompt, which:

  1. Consumes ~46% of a 256K context window
  2. Loads all SKILL.md file bodies into memory at startup (~88 MB JS heap)
  3. Forces the LLM to scan a massive listing to find the right skill — poor accuracy due to lost-in-the-middle effects

What Changed

A 3-tier auto-detection system based on invocable skill count. No feature flag needed — the tier is selected automatically at session start.

Skill count Listing format Strategy
≤ 80 Legacy full listing (name + desc + path) Prompt-cache optimal for small catalogues
81–300 Compact (name + 80-char desc) Skill search available
> 300 Names only Skill search required

New files

  • skill/search.ts — BM25 search index with synonym expansion. Zero external dependencies. ~24ms to build for 1,500 skills, <1ms per query.

Modified files

  • skill/registry.ts — Auto-detect tier based on skill count. searchSkills() method with lazy index rebuild. renderSkillPrompt() lazy-loads SKILL.md content from disk only when a skill is activated.
  • skill/parser.tsparseSkillMetaFromFile() reads only YAML frontmatter (~20 lines) instead of the full SKILL.md body. Uses LAZY_CONTENT_SENTINEL to distinguish "not loaded" from "genuinely empty body".
  • skill-tool.ts / .mdSkill tool extended with action: "search" alongside existing action: "load". Backward-compatible — skill field remains required.
  • profile/default/system.md — Two-step skill usage: search first, then load.
  • session/index.ts — Minor cleanup (removed flag wiring).
  • flags/registry.ts — No new flags needed.

Tests

  • New tests merged into existing registry.test.ts (search, tiers, lazy rebuild)
  • New integration-proof.test.ts validates what the LLM actually sees with real 1,530 skills
  • Existing scanner.test.ts updated for lazy loading via renderSkillPrompt
  • 2,572 tests pass, 0 failures

Performance (measured with 1,530 real skills from antigravity-awesome-skills)

Metric Before After Reduction
System prompt 118,120 tokens 8,436 tokens 93%
Startup memory ~88 MB ~4 MB 95%
Search latency N/A (LLM scans listing) 0.0–0.2ms (BM25)
Lazy content load N/A (eager at startup) 0.4ms per activation

Checklist

  • Tests added and passing (2,572/2,572)
  • Lint clean for modified files (oxlint + sherif)
  • TypeScript clean (tsc --noEmit, 0 errors)
  • Changeset generated (minor bump)
  • No co-author attribution or agent identity
  • No internal identifiers in diff
  • Backward-compatible (existing skill callers unchanged)

When >80 skills are installed, the system prompt switches from a full
listing to a compact name-only format. The model discovers skills via
the Skill tool's new action:"search" endpoint backed by a BM25 index.

Three tiers by skill count:
  ≤ 80:   legacy full listing (prompt-cache optimal)
  81-300: compact name + description + search
  > 300:  names only + search required

Key changes:
- skill/search.ts: BM25 index with synonym expansion (zero deps)
- skill/registry.ts: auto-detect tier, lazy content loading
- skill/parser.ts: parseSkillMetaFromFile (frontmatter-only)
- skill-tool.ts: search action on existing Skill tool
- system.md: search-first workflow instructions

Performance (measured with 1,530 real skills):
- System prompt: 118K → 8.4K tokens (93% reduction)
- Startup memory: 88MB → 4MB (95% reduction)
- Search latency: 0.0-0.2ms per query
- Lazy content load: 0.4ms per skill activation

Closes MoonshotAI#725
@changeset-bot

changeset-bot Bot commented Jun 13, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: b1136f9

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/kimi-code Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4eca85320e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 57 to 58
export const SkillToolInputSchema: z.ZodType<SkillToolInput> = z.object({
skill: z.string(),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Allow search calls without a skill name

When the model follows the new instructions/example and calls Skill as {"action":"search","query":"e2e test"}, preflight validation rejects it before execution() runs because this schema still makes skill required; loop/tool-call.ts validates the advertised JSON schema with AJV. This breaks the large-catalog search-first flow unless the model invents a dummy skill name, so skill should be optional for search or the schema should be discriminated by action.

Useful? React with 👍 / 👎.

import { join } from 'node:path';
import { performance } from 'node:perf_hooks';

const SKILLS_DIR = join(homedir(), '.kimi', 'skills');

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid depending on a local home skill catalog

These tests depend on ~/.kimi/skills existing and containing the author's real 1,530-skill catalog. In a clean CI or developer checkout this path is usually absent or has different contents, so the registry loads no/different skills and the listing assertions or results[0]! dereference fail. Build a temporary fixture catalog inside the test instead of reading the user's home directory.

Useful? React with 👍 / 👎.

Comment thread packages/agent-core/src/skill/parser.ts Outdated
fencesFound++;
if (fencesFound === 2) break;
}
offset += line.length + 1;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve CRLF frontmatter fences when slicing

For SKILL.md files written with CRLF line endings, split(/\r?\n/) removes a two-character newline but this offset only adds + 1, so frontmatterOnly is sliced before the second ---; parseSkillText then reports a missing closing fence and discovery skips the skill. This regresses Windows-authored skills that the full parser already accepts, so compute the fence position on the original buffer or account for \r\n lengths.

Useful? React with 👍 / 👎.

P1: Make skill field optional in SkillToolInputSchema — search-only
    calls no longer need a dummy skill name. Validation added for
    load action requiring skill parameter.

P1: Replace home-dir-dependent integration tests with self-contained
    temp fixture (350 SKILL.md files in mkdtemp). Tests are now
    portable across CI and developer machines.

P2: Fix CRLF fence offset in parseSkillMetaFromFile — split on \n
    and strip trailing \r to correctly account for 2-byte newlines
    when computing the slice boundary.

Also: update skill-tool.test.ts contract to reflect optional skill.
Six smoke tests with real fixture files (350 SKILL.md):
1. Tier auto-detection (names-only for 350 skills)
2. Lazy content loading (sentinel → readFileSync)
3. BM25 search accuracy (8/8 domain queries correct)
4. Schema validation (search without skill name)
5. CRLF frontmatter parsing
6. Full model flow: search → pick → load → render
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(skill): skill search via BM25 index for large skill catalogues

1 participant