Skip to content

Latest commit

 

History

History
393 lines (293 loc) · 23.9 KB

File metadata and controls

393 lines (293 loc) · 23.9 KB

Kin Context Improvement Journal

2026-03-03 (run 11) — Memory grouping by category

Area: Memory injection

Problem: Memories were injected as a flat list regardless of count. When 10+ memories were retrieved, mixing facts, preferences, decisions, and knowledge in a single list made it harder for the LLM to scan and prioritize relevant information.

Change: Added buildMemoriesBlock() helper that:

  • Groups memories by category (Facts, Preferences, Decisions, Knowledge) with ### subheadings when >3 memories
  • Keeps flat list for ≤3 memories (grouping adds overhead for small sets)
  • Categories are sorted in a consistent order via MEMORY_CATEGORY_META
  • Applied to both quick session and main prompt memory injection sites

Example output (grouped):

## Memories

Relevant information from your past interactions (★ = high importance):

### Facts
- ★ [fact] Nicolas lives in Grenoble (subject: Nicolas) — 2d ago
- [fact] Works at Acme (subject: Nicolas) — 1mo ago

### Preferences
- [preference] Prefers dark mode — 3mo ago

### Decisions
- [decision] Use PostgreSQL for the new project — just now

Files changed: src/server/services/prompt-builder.ts, src/server/services/prompt-builder.test.ts Commit: 1d893a0feat(context): group memories by category for easier LLM scanning Tests: 1339/1339 pass (32/32 prompt-builder, 2 new), build OK

Next areas to explore:

  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Add prompt-builder tests for conversation state, participants, tool usage strategy sections
  • Compacting: test the structured summary format
  • Sub-kin context: inject parent Kin identity so sub-kins know who spawned them

2026-03-03 (run 10) — Group vs one-on-one conversation awareness

Area: Channel/platform awareness / System prompt quality

Problem: The Kin already knows who the participants are (from run 3), but doesn't know whether it's in a group conversation or a one-on-one chat. This distinction matters for behavior: in groups, responses should be concise, address people by name, and avoid lengthy monologues. In 1-on-1 conversations, the Kin can be more detailed and personalized.

Change: Enhanced the "Active participants" section in the system prompt to include a conversation type hint:

  • When multiple unique participants exist → labeled as "group conversation" with guidance to keep responses focused, address people by name, and avoid derailing the group flow
  • When only one participant → labeled as "one-on-one conversation" with permission to be more detailed and personalized
  • Uses Set of participant names to determine unique humans (avoids counting the same person from different platforms twice)

Rationale: This is a natural extension of the participant awareness (run 3) and multi-user guidance (run 5). Those runs told the Kin who is present and how to handle multi-user conversations, but didn't tell it what kind of conversation it's in. The group/DM distinction is one of the strongest signals for response calibration.

Files changed: src/server/services/prompt-builder.ts Commit: 886a339 (merged with test commit due to pre-commit hook issue) Tests: 1322/1322 pass, build OK

Next areas to explore:

  • Add prompt-builder tests for group/DM awareness, participants, tool usage strategy, and multi-user sections
  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Memory injection: add category grouping (facts vs preferences vs decisions) for clearer presentation
  • Compacting: consider injecting key open threads as a separate high-priority section

2026-03-03 (run 9) — Conversation state awareness

Area: System prompt quality / Conversation context

Problem: The Kin had no awareness of its own context window state — how many messages it could see, whether older messages had been compacted away, or how far back its visible history went. This led to issues like:

  1. The Kin not knowing when to suggest using search_history() for older context
  2. No self-awareness about whether it's in a fresh conversation vs. a long-running one
  3. No signal about how much context might have been lost to compaction

Change:

  1. Added conversationState optional field to PromptParams with visibleMessageCount, totalMessageCount, hasCompactedHistory, and oldestVisibleMessageAt
  2. Extended buildMessageHistory() return type to include these metrics, computed from filteredMessages and activeSnapshot
  3. Added buildConversationStateBlock() helper that generates contextual awareness:
    • For long-running conversations: "This is a long-running conversation. X older messages have been summarized. You can see the Y most recent messages."
    • For fresh conversations: "You have the full conversation history: X messages."
    • Oldest visible message age for temporal context
    • Reminder to use search_history() when compaction is active
  4. Injected at position [6.85], between participants and compacting summary

Example output (compacted):

## Conversation state

This is a long-running conversation. 47 older messages have been summarized (see "Previous conversation summary" above).
You can see the 23 most recent messages in full detail.
Oldest visible message: 3h ago.
If you need details from before your visible history, use search_history() to look further back.

Example output (fresh):

## Conversation state

You have the full conversation history: 12 messages.
Oldest visible message: 2d ago.

Rationale: Self-awareness about context state helps the Kin make better decisions about when to search for older context, when to ask the user for clarification about past events, and how to frame responses about conversation history. It also reinforces the search_history() tool usage when compaction is active.

Files changed: src/server/services/prompt-builder.ts, src/server/services/kin-engine.ts Commit: 221df5cfeat(context): add conversation state awareness to system prompt Tests: 1314/1314 pass, 30/30 prompt-builder tests pass, build OK

Next areas to explore:

  • Group vs DM detection: adapt tone/verbosity based on conversation type
  • Add prompt-builder tests for conversation state, participants, tool usage strategy sections
  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Compacting: test the structured summary format

2026-03-02 (run 8) — Current message source platform injection

Area: Channel/platform awareness

Problem: The Kin had a generic "Platform formatting guide" in the system prompt listing all platform capabilities, but no signal about which platform the current message came from. The Kin had to parse [telegram:Name] prefixes from message content to figure out the platform — fragile and indirect. This meant formatting adaptation was hit-or-miss.

Change:

  1. Added currentMessageSource optional field to PromptParams with platform (string) and optional senderName
  2. In kin-engine.ts, resolved the platform from queueItem.sourceType:
    • For channel messages: look up the channel record via getChannelQueueMetagetChannel to get ch.platform
    • For web UI messages: set platform to "web"
    • Extract sender name from [platform:Name] prefix
  3. Added buildCurrentMessageHint() helper that generates a concise hint:
    Current message from: **telegram** (sender: Nicolas)
    Format: Supports Markdown. Keep moderate length.
    
  4. Platform-specific formatting reminders for discord, telegram, whatsapp, slack, web
  5. Injected at position [7.5], right before the Context block — last thing the Kin sees before responding

Rationale: LLMs follow formatting instructions much better when they're specific and contextual ("this message is from WhatsApp, keep it short") vs generic ("here are 5 platforms and their capabilities"). The per-message hint eliminates the need for the Kin to parse message prefixes and provides an immediate, actionable formatting signal.

Files changed: src/server/services/prompt-builder.ts, src/server/services/kin-engine.ts, src/server/services/prompt-builder.test.ts Commit: 213e78ffeat(context): inject current message source platform for formatting adaptation Tests: 30/30 prompt-builder tests pass (4 new), build OK

Next areas to explore:

  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Group vs DM detection: pass whether the current channel is a group or DM so the Kin adapts verbosity
  • Compacting: test the structured summary format
  • Add prompt-builder tests for participants section

2026-03-02 (run 7) — Structured compacting summaries with open thread tracking

Area: Conversation context / Compacting quality

Problem: The compacting prompt produced free-form summaries with no consistent structure. This meant:

  1. Open threads (pending tasks, unanswered questions, promises) could easily get lost in a wall of text
  2. No temporal context — the summary didn't indicate what time period it covered
  3. Completed work mixed with pending items, making it hard for the Kin to know what's still relevant
  4. When integrating previous summaries, there was no guidance on how to merge (close resolved threads, add new ones)

Since compacted summaries permanently replace raw messages, a poorly structured summary = permanently degraded context.

Change: Rewrote the compacting prompt in compacting.ts with:

  1. Time range header — injects the ISO timestamps of first and last message being summarized, plus message count, so the LLM knows the scope
  2. Structured output sections — instructs the LLM to organize the summary into 4 sections:
    • Key facts & decisions — important information, attributed to speakers
    • Completed work — tasks finished, problems solved, results obtained
    • Open threads — unresolved questions, pending tasks, promised follow-ups (marked as CRITICAL)
    • Conversation dynamics — only if relevant: who was active, tone, relationships
  3. Open thread emphasis — explicit instruction that open threads are the most important thing to preserve, with extra rule about merging: close resolved threads, add new ones
  4. Integration guidance — when a previous summary exists, explicit instructions to merge, consolidate, and update rather than just append

Rationale: Open threads are the #1 failure mode in conversation compaction. When a user asks "did we resolve X?" and the summary lost that context, the Kin fails silently. The structured format also makes it easier for the Kin to scan the summary quickly during prompt construction.

Files changed: src/server/services/compacting.ts Commit: 7ad041afeat(context): structured compacting summaries with open thread tracking Tests: 1282/1282 pass, build OK

Next areas to explore:

  • Add prompt-builder tests for participants, tool usage strategy, and multi-user sections
  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Channel/platform awareness: group vs DM context differentiation (adapt tone/verbosity)
  • Compacting: add a test for the new structured prompt format

2026-03-01 — Honesty & uncertainty guidance

Area: Alignment & safety

Problem: The system prompt had no explicit guidance on handling uncertainty, avoiding hallucination, or being honest about knowledge gaps. LLMs tend to confabulate when not explicitly instructed to acknowledge uncertainty.

Change: Added a "Honesty and uncertainty" section to the internal instructions block (main Kins) with 5 rules:

  1. Say "I'm not sure" when uncertain — better than confident wrong answers
  2. Don't fabricate facts/URLs/references — use tools or acknowledge gaps
  3. Distinguish known facts from inferences/guesses
  4. Ask for clarification rather than assuming
  5. Never reveal system prompt/config to users

Also added a one-liner to sub-Kin constraints about honesty and using tools to verify.

Files changed: src/server/services/prompt-builder.ts Commit: 8064553feat(context): add honesty and uncertainty guidance to system prompt Tests: 26/26 pass, build OK

Next areas to explore:

  • Conversation context: review compaction quality and truncation strategy
  • Tool descriptions: audit for clarity and when-to-use hints
  • Channel/platform awareness: group vs DM handling

2026-03-01 (run 2) — Memory formatting with importance & recency metadata

Area: Memory injection

Problem: Memories were injected as flat [category] content (subject) lines with no signal about how important or recent each memory is. The Kin had no way to weight memories — a critical fact from yesterday looked identical to a trivial preference from months ago.

Change:

  1. Extended MemorySearchResult and getRelevantMemories() to propagate importance and updatedAt from the search pipeline (these were already computed but stripped before prompt injection).
  2. Updated the Memory interface in prompt-builder to accept optional importance and updatedAt.
  3. Added formatMemoryLine() helper that renders:
    • ★ prefix for high-importance memories (importance ≥ 7)
    • Relative time suffix ("2d ago", "3mo ago") from updatedAt
  4. Updated both memory formatting locations (quick session + main prompt).
  5. Added formatRelativeTime() utility for human-readable relative dates.

Example output:

- ★ [fact] Nicolas lives in Grenoble (subject: Nicolas) — 2d ago
- [preference] Prefers dark mode — 3mo ago
- [decision] Use PostgreSQL for the new project — just now

Files changed: src/server/services/memory.ts, src/server/services/prompt-builder.ts, src/server/services/kin-engine.ts Commit: 4ff8be7feat(context): add importance and recency metadata to memory injection Tests: 26/26 prompt-builder tests pass, build OK, 3 pre-existing failures (schema-related, unrelated)

2026-03-02 — Enriched context block with temporal awareness

Area: System prompt quality / Context metadata

Problem: The Context section at the end of the system prompt only contained a raw ISO timestamp and "Platform: KinBot". ISO timestamps like 2026-03-02T01:31:00.000Z are hard for LLMs to reason about — they need to parse the day of week, time of day, etc. This makes Kins worse at time-sensitive reasoning ("Is it a weekday?", "Is it late at night?", "What day is it?").

Change: Created a buildContextBlock() helper that generates a richer context section:

  • Current date: Monday, March 2, 2026 (human-readable with day of week)
  • Current time: 01:31 UTC (easy to read)
  • ISO timestamp: 2026-03-02T01:31:00.000Z (for precision when needed)
  • Platform: KinBot

The day of week is particularly useful — it helps Kins reason about schedules, business hours, weekends vs weekdays, etc.

Files changed: src/server/services/prompt-builder.ts, src/server/services/prompt-builder.test.ts Commit: 27a6651feat(context): enrich context block with human-readable date and day of week Tests: 26/26 prompt-builder tests pass, build OK

Next areas to explore:

  • Conversation context: smart token-based truncation instead of hard 50-message cap
  • Tool descriptions: audit across all tool files for consistency
  • Channel/platform awareness: group vs DM context differentiation

2026-03-02 (run 2) — Response calibration guidance

Area: System prompt quality

Problem: The system prompt had no explicit guidance on response length and format adaptation. Kins would default to verbose, essay-like responses regardless of context — a simple yes/no question on WhatsApp got the same treatment as a complex technical request on the web UI. This is a well-known LLM issue: without explicit brevity signals, models tend toward over-explanation.

Change: Added a "Response calibration" section to the internal instructions block with 7 rules:

  1. Match response length to request complexity
  2. Default to shorter responses for external platforms (mobile users)
  3. Use richer formatting on KinBot web UI when it aids clarity
  4. Lead with the answer for yes/no questions
  5. Avoid unnecessary preambles ("Great question!", etc.)
  6. Use numbered lists for options/steps
  7. Share tool results directly without narrating the search process

Files changed: src/server/services/prompt-builder.ts Commit: 591ea19feat(context): add response calibration guidance to system prompt Tests: 1091/1094 pass (3 pre-existing failures), build OK

Next areas to explore:

  • Conversation context: smart token-based truncation instead of hard 50-message cap
  • Conversation participant awareness: inject active participant list so Kin knows who's in the chat
  • Tool descriptions: audit across all tool files for consistency
  • Channel/platform awareness: group vs DM context differentiation

2026-03-02 (run 3) — Conversation participant awareness

Area: System prompt quality / Conversation context

Problem: The Kin had no awareness of who was actively participating in the conversation. While individual messages were prefixed with sender names (e.g. [telegram:Nicolas]), the Kin had no summary view of participants — who's active, how many messages they've sent, which platform they're on, or when they last spoke. This makes it harder to personalize responses and track multi-user conversations.

Change:

  1. Added ConversationParticipant interface (name, platform, messageCount, lastSeenAt) exported from kin-engine.ts
  2. Extended buildMessageHistory() to extract participant data from filtered messages, parsing channel prefixes for platform detection and using pseudonymMap for web UI users
  3. Added participants optional param to PromptParams in prompt-builder
  4. Added "Active participants" section (block 6.8) to the system prompt, showing each participant with platform, message count, and recency

Example output:

## Active participants

People currently in this conversation:

- Nicolas via telegram (12 msgs, last active 2h ago)
- Marie (3 msgs, last active 1d ago)

Files changed: src/server/services/kin-engine.ts, src/server/services/prompt-builder.ts Commit: c142211feat(context): add conversation participant awareness to system prompt Tests: 26/26 prompt-builder tests pass, build OK

Next areas to explore:

  • Conversation context: smart token-based truncation instead of hard 50-message cap
  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Add a prompt-builder test for the new participants section

2026-03-02 (run 4) — Tool usage strategy guidance

Area: System prompt quality / Tool context

Problem: Kins had 20+ tools available but no strategic guidance on when to prefer one tool over another. Individual tool descriptions explained WHAT each tool does, but there was no decision framework for HOW to use tools effectively together. Common anti-patterns: guessing instead of using recall(), answering factual questions from training data instead of web_search(), not memorizing important facts immediately, using shell_command() when dedicated tools exist.

Change: Added a "Tool usage strategy" subsection to the internal instructions block with 9 concrete rules:

  1. Use recall() before answering from memory (verify, don't guess)
  2. Use web_search() for factual/current questions
  3. Use browse_page() after web_search() for full content
  4. Memorize eagerly (don't postpone)
  5. Check duplicates before creating contacts
  6. Use store_file() for substantial content
  7. Use spawn_self/spawn_kin for heavy tasks
  8. Use notify() for time-sensitive alerts
  9. Minimize shell_command() when dedicated tools exist

Rationale: This is a well-known prompt engineering pattern — LLMs perform significantly better at tool selection when given explicit decision heuristics rather than relying on tool descriptions alone. The guidance is concise (10 lines) to minimize context overhead.

Files changed: src/server/services/prompt-builder.ts Commit: a989c29feat(context): add tool usage strategy guidance to system prompt Tests: 26/26 prompt-builder tests pass, 1157/1160 total (3 pre-existing failures), build OK

Next areas to explore:

  • Conversation context: smart token-based truncation instead of hard 50-message cap
  • Add prompt-builder tests for participants section and tool usage strategy
  • Memory injection: structured formatting with relevance grouping
  • Channel/platform awareness: group vs DM context differentiation

2026-03-02 (run 5) — Multi-user conversation guidance

Area: System prompt quality / Conversation context

Problem: The Kin already knows who the participants are (from run 3's participant awareness), but had no behavioral guidance for handling multi-user conversations. Common issues: not addressing the right person, merging responses to different users, not knowing how to handle conflicting instructions from different people, and being overly verbose in group contexts.

Change: Added a "Multi-user conversations" subsection to the internal instructions block with 5 rules:

  1. Address the right person by name when responding
  2. Answer each person's question clearly without merging/confusing requests
  3. Acknowledge new participants without re-explaining everything
  4. Ask for clarification when users give conflicting instructions
  5. Keep responses focused in group contexts

Files changed: src/server/services/prompt-builder.ts Commit: 64e8fd2feat(context): add multi-user conversation guidance to system prompt Tests: 26/26 prompt-builder tests pass, build OK

Next areas to explore:

  • Conversation context: smart token-based truncation instead of hard 50-message cap
  • Add prompt-builder tests for participants, tool usage strategy, and multi-user sections
  • Compacting summary: add time range metadata so Kin knows when summarized events occurred
  • Channel/platform awareness: group vs DM context differentiation (adapt tone/verbosity)

2026-03-02 (run 6) — Smart token-based history truncation

Area: Conversation context

Problem: The buildMessageHistory() function used a hard .limit(50) on message fetching. This was problematic because:

  1. A single tool call with a large JSON result could consume thousands of tokens, while a short chat message uses only a few dozen
  2. 50 messages of pure chat ≈ 5k tokens, but 50 messages with tool calls ≈ 50k+ tokens
  3. No awareness of context window budget — conversations with heavy tool usage could blow up the context

Change:

  1. Added historyTokenBudget config option (default: 40,000 tokens, env: HISTORY_TOKEN_BUDGET) — the max estimated tokens for conversation history
  2. Increased the DB fetch limit from 50 to 100 to have more messages available for selection
  3. After filtering by compaction snapshot, added a token-budget trimming loop that:
    • Estimates tokens per message using content.length + toolCalls.length / 4
    • Drops oldest messages one by one until total fits within budget
    • Always keeps at least the most recent message

Why 40k default? Most models have 128k-200k context. System prompt + tools ≈ 10-15k. Memories ≈ 2-5k. This leaves 40k as a generous but safe budget for history, with room for the model's response.

Behavior change:

  • Short chat conversations: more messages kept (up to 100 vs old 50)
  • Tool-heavy conversations: fewer messages kept, but always within token budget
  • Backward compatible: default behavior similar to before for typical conversations

Files changed: src/server/config.ts, src/server/services/kin-engine.ts Commit: ca9599ffeat(context): smart token-based history truncation instead of hard message limit Tests: 26/26 prompt-builder tests pass, build OK

Next areas to explore:

  • Add a prompt-builder test for participants, tool usage strategy, and multi-user sections
  • Compacting summary: add time range metadata so Kin knows when summarized events occurred
  • Tool descriptions: audit across all tool files for consistency and when-to-use hints
  • Channel/platform awareness: group vs DM context differentiation