Skip to content

refactor: memory optimization#215

Merged
pedronauck merged 4 commits into
mainfrom
optimizations
May 27, 2026
Merged

refactor: memory optimization#215
pedronauck merged 4 commits into
mainfrom
optimizations

Conversation

@pedronauck
Copy link
Copy Markdown
Member

@pedronauck pedronauck commented May 27, 2026

Summary by CodeRabbit

  • New Features

    • Installer now uses a pinned cosign verifier (v2.2.4) when no local cosign is present.
    • New CLI analysis tools to produce token-usage baselines and estimate token/byte savings.
  • Improvements

    • Memory extractor: added backpressure, throttling, empty-turn skipping, and richer runtime metrics.
    • Network delivery compacts per-session guidance after first delivery.
    • Session spawn now accepts model selection; compact catalog marker added.
    • Various tests, docs, and small UI/type polish.

Review Change Stack

@pedronauck pedronauck self-assigned this May 27, 2026
@vercel
Copy link
Copy Markdown

vercel Bot commented May 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agh-site Ready Ready Preview, Comment May 27, 2026 4:42pm

Request Review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 27, 2026

Caution

Review failed

Pull request was closed or merged during review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f05f321b-681f-487a-9263-1bebdf2df20f

📥 Commits

Reviewing files that changed from the base of the PR and between c423564 and 5c8578c.

⛔ Files ignored due to path filters (1)
  • .agents/skills/autoreview/scripts/autoreview is excluded by !.agents/**
📒 Files selected for processing (11)
  • .compozy/tasks/tokens-perf/analysis/collect_token_baseline.py
  • .compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py
  • internal/api/core/memory_services_test.go
  • internal/daemon/native_tools_test.go
  • internal/daemon/prompt_skills.go
  • internal/daemon/prompt_skills_test.go
  • internal/memory/extractor/runtime.go
  • internal/memory/extractor/runtime_test.go
  • internal/network/delivery.go
  • internal/network/delivery_test.go
  • web/style-imports.d.ts
✅ Files skipped from review due to trivial changes (1)
  • web/style-imports.d.ts
🚧 Files skipped from review as they are similar to previous changes (8)
  • internal/api/core/memory_services_test.go
  • internal/daemon/native_tools_test.go
  • internal/memory/extractor/runtime_test.go
  • internal/network/delivery_test.go
  • internal/memory/extractor/runtime.go
  • .compozy/tasks/tokens-perf/analysis/collect_token_baseline.py
  • .compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py
  • internal/daemon/prompt_skills.go

Walkthrough

This PR introduces extractor backpressure and throttling with new runtime metrics, implements skills-catalog caching/compact markers, compacts per-session network guidance, bootstraps a pinned cosign verifier in the installer (with tests), and adds two Python token-analysis scripts plus supporting tests and frontend/type tweaks.

Changes

Memory Extractor Runtime Backpressure and Metrics

Layer / File(s) Summary
Extractor runtime queue and backpressure infrastructure
internal/memory/extractor/runtime.go
Adds queue capacity, throttle options, provider-slot limiter, and new RuntimeStats fields (ActiveProviderSessions, SkippedTurns, BackpressuredSessions).
Enqueue and throttle logic with skipped-turn handling
internal/memory/extractor/runtime.go
Refactors Enqueue to drop empty extractable turns (emit EventDropped), queue per-session requests, and schedule/cancel per-session throttle flush timers.
Drain orchestration and session flushing
internal/memory/extractor/runtime.go
Refactors Drain() into a loop that repeatedly flushes queued sessions and waits for workers; adds flushQueuedSessions, flushSession, hasPendingWork, and session-clear logic.
Provider slot acquisition and backpressure tracking
internal/memory/extractor/runtime.go
Centralizes provider-slot acquire/release with contention tracking and increments backpressuredSessions when slots are exhausted.
Extractor configuration and validation constraints
internal/config/config.go, internal/config/tool_surface.go
Adds constants for extractor mode and coalesce path, restricts mode to post_message, and validates CoalesceMax >= ThrottleTurns.
Forked extractor model and session spawn propagation
internal/session/spawn.go, internal/daemon/memory_runtime.go
Adds Model to SpawnOpts and propagates trimmed model into forked extractor child spawn opts; daemon wires queue/throttle options.
Extractor status payload and daemon integration
internal/api/contract/memory.go, internal/daemon/memory_runtime.go
Extends MemoryExtractorStatusPayload with new counters and updates daemon status reporting; adjust JSONL output collection to avoid synthetic newlines.
Extractor runtime test coverage
internal/memory/extractor/runtime_test.go
Adds tests for backpressure, same-session throttling/coalescing, Drain() semantics, skipped-empty-turn handling, and helpers (waitRuntimeStats, waitUntilDrainBlocksEnqueue).
API and daemon extractor tests
internal/api/core/memory_services_test.go, internal/daemon/memory_runtime_test.go, internal/daemon/native_tools_test.go
Updates injected status payloads and assertions; adds tests for JSONL output parsing and forked extractor spawn behavior plus recording harness.
Extractor docs/UI text
internal/memory/extractor/events.go, web/src/routes/_app/settings/memory.tsx
Updates EventDropped comment and memory settings mode description to reflect post_message mode.

Skills Catalog Caching and Network Delivery Guidance Compaction

Layer / File(s) Summary
Skills catalog signature caching and unchanged mode
internal/daemon/prompt_skills.go, internal/skills/catalog.go
Replaces closure augmenter with skillsCatalogAugmenter that caches per-session SHA-256 catalog signatures, emits compact BuildCurrentCatalogUnchanged() when unchanged, and performs LRU eviction.
Skills catalog augmenter tests and helpers
internal/daemon/prompt_skills_test.go
Splits tests into parallel subtests covering unchanged-state compaction, registry changes, and ACPSessionID-based replay; adds test helpers newPromptSkillsAugmenterForTest and newPromptSkillsSession.
Network delivery per-session guidance state and compaction
internal/network/delivery.go
Tracks per-session guidance delivery state, selects verbose vs compact guidance per envelope, tokenizes inbound queues to guard updates under retries, and updates formatting to formatNetworkMessageWithGuidance.
Network delivery guidance tests & benchmarks
internal/network/delivery_test.go, internal/network/perf_bench_test.go
Adds tests validating guidance compaction across deliveries and session lifecycle; factors envelope helper and adds benchmark BenchmarkFormatNetworkMessageGuidanceModes.

Installation Script and Release Verification

Layer / File(s) Summary
Installer cosign resolution and verification
packages/site/public/install.sh
Introduces pinned COSIGN_VERSION="v2.2.4", platform artifact mapping and SHA256, verify_file_sha256, and resolve_cosign() to reuse local cosign or download/verify pinned verifier; uses $COSIGN_BIN verify-blob.
Public install contract tests
packages/site/lib/__tests__/public-install-contract.test.ts
Updates tests to simulate missing cosign, asserts pinned verifier download and "$COSIGN_BIN" verify-blob invocation, and adds fixture/server helpers and tarball fixtures.
Release docs and guards
.goreleaser.release-header.md.tmpl, .goreleaser.release-footer.md.tmpl, internal/config/release_config_test.go
Expands installer provenance steps in release docs and adds text-guard assertions expecting pinned cosign fallback behavior.

Token Performance Analysis and Cost Validation

Layer / File(s) Summary
Token baseline collection
.compozy/tasks/tokens-perf/analysis/collect_token_baseline.py
New script: reads Claude JSONL transcripts (dedupe, minute-bucket, classify sessions), queries agh.db read-only with dynamic column detection, scans crash-bundles, and emits combined JSON for requested UTC window.
Token cost validation
.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py
New script: extracts network prompt events, computes before/after byte/token estimates including savings from repeated catalogs and optional compact guidance, aggregates per-file and corpus totals and prints JSON report.

Web Frontend Type Refinements

Layer / File(s) Summary
Session adapter and thread repository typing
web/src/systems/session/lib/session-history-adapter.ts, web/src/systems/session/lib/session-thread-repository.ts, web/style-imports.d.ts
Adds typed MessageStorageEntry usage, explicit getId typing, internal ExportedThreadMessageItem alias, refactors mapping, and adds ambient *.css module declaration.

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • compozy/agh#29: Modifies network guidance/rendering logic similar to this PR’s delivery guidance compaction changes.
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 8.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'refactor: memory optimization' is vague and generic, using non-descriptive terms that don't clearly convey the specific changes in the changeset. Use a more specific title that highlights the primary change, such as 'refactor: add memory extractor backpressure and token baseline analysis' or similar.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch optimizations

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
internal/network/delivery_test.go (1)

521-575: ⚡ Quick win

Add a drop-while-in-flight regression case.

This reset test drops the session only after the first delivery is finished, so it misses the in-flight race where guidance state can be reinserted after drop. Add a case that calls dropSession() before finishCall() and then verifies the next delivery is verbose.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/network/delivery_test.go` around lines 521 - 575, Add a new subtest
that reproduces the in-flight regression by calling
coordinator.dropSession("sess-guidance-drop") before finishing the first
prompter call: create a t.Run (e.g. "Should reset guidance state when dropped
while in-flight"), set up ctx/coordinator/prompter and accept the first Delivery
(as in the existing test), call prompter.waitForCalls(t, 1), then immediately
call coordinator.dropSession("sess-guidance-drop") before prompter.finishCall(0,
...); accept the second Delivery and assert (using prompter.waitForCalls and
inspecting prompter.call(1).message) that the verbose guidance snippets (the
same list of snippets used in the existing test) are present. Use the same
identifiers (newDeliveryCoordinator, acceptOne, dropSession,
prompter.finishCall, prompter.waitForCalls, prompter.call) so the new test
mirrors the original but drops while the first delivery is in flight.
.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py (1)

23-27: ⚡ Quick win

Derive the compact catalog size instead of hardcoding 301.

The savings math depends on this exact byte count. If the compact "catalog unchanged" marker changes, the script will keep reporting stale savings with no signal. Keep the compact form as text here and compute len(...encode()) from it instead of relying on a magic number.

Also applies to: 186-191

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py around
lines 23 - 27, Replace the hardcoded COMPACT_CATALOG_BYTES value with a derived
byte length computed from the compact text constant: compute
COMPACT_CATALOG_BYTES = len(COMPACT_GUIDANCE_TEXT.encode()) (or the appropriate
compact catalog text variable used elsewhere) so the byte count always reflects
the actual encoded size; also update the other occurrence that currently
hardcodes 301 (the second compact form referenced) to compute its length the
same way (use the exact compact-form string variable for len(...encode()) to
ensure consistency).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.compozy/tasks/tokens-perf/analysis/collect_token_baseline.py:
- Around line 146-167: The assistant usage counting block currently tallies
every transcript row regardless of the requested UTC window; update
collect_claude()/the loop that builds stats (the block using stats, raw_rows,
dedupe_key, TOKEN_FIELDS, minute_bucket, calls_by_minute, cache_read_by_minute,
session_starts_in_window_by_class, top_files_by_cache_read, etc.) to first
compute/parse the row timestamp and skip the row if timestamp < start or
timestamp >= end (the same window logic used for
session_starts_in_window_by_class); only after passing that time-window check
should you increment stats.raw_assistant_usage_rows, raw_rows, add to
seen/deduped counts, accumulate TOKEN_FIELDS into stats.usage and totals, update
calls_by_minute/cache_read_by_minute, and update
by_class/top_files_by_cache_read so all counters remain confined to the
requested UTC window.

In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py:
- Line 14: Replace the hardcoded ROOT pathlib.Path assignment with a
configurable approach: read the corpus path from an environment variable (e.g.,
CORPUS_ROOT) or a CLI argument (use argparse) and fall back to a sensible
default; update the ROOT variable and any code that references it (search for
ROOT in validate_task010_token_cost.py and the other occurrences around the
blocks mentioned: lines ~136-156 and ~232-233) to use the parsed/validated path,
and add a clear error message if the provided path does not exist or is not a
directory.

In `@internal/api/core/memory_services_test.go`:
- Around line 25-34: Add assertions to verify ActiveProviderSessions is
correctly serialized in the test: in memory_services_test.go locate the
assertions on statusPayload.Extractor and add checks that
statusPayload.Extractor.ActiveProviderSessions equals the expected fixture value
(the value set in the fixture, e.g., 1). Do this for both occurrences mentioned
(around the existing assertions at lines ~25-34 and ~61-66) so the test fails if
ActiveProviderSessions is not mapped/serialized correctly.

In `@internal/daemon/native_tools_test.go`:
- Around line 3885-3887: The test is missing an assertion for
ActiveProviderSessions; update the test that calls
requireNativeStructuredContains with extractorResult to also assert the field is
rendered by adding a check for `"active_provider_sessions":1` (i.e., add another
requireNativeStructuredContains(t, extractorResult,
[]byte(`"active_provider_sessions":1`))) alongside the existing assertions that
reference requireNativeStructuredContains and extractorResult.

In `@internal/daemon/prompt_skills.go`:
- Line 17: Replace the hardcoded constant promptSkillsCatalogCacheMaxSessions
with a configurable value: add a configuration option (e.g.,
PromptSkillsCatalogCacheMaxSessions) to the daemon/app config or an options
struct used to construct the component that uses the constant, load the value
from the config/env at startup, and update all references to use the injected
config field instead of the literal constant in internal/daemon/prompt_skills.go
(and any constructor like NewPromptSkillsCatalog or similar). Validate/normalize
the config value (provide a sensible default if not set) and update any tests to
provide the option via the constructor or test config.

In `@internal/memory/extractor/runtime_test.go`:
- Around line 936-947: The test's timeout is too short and causes flakes:
increase the deadline duration (replace time.After(time.Second)) to a longer
value (e.g., 5s or 10s) so the loop that polls runtime.Stats() with tick (ticker
:= time.NewTicker(10 * time.Millisecond) / tick variable) has more time to
observe matching stats before the select hits the deadline case that calls
t.Fatalf; update the deadline variable initialization accordingly and keep the
existing tick behavior and defer tick.Stop().

In `@internal/memory/extractor/runtime.go`:
- Around line 394-410: The off-by-one: change the drop condition so we reject a
new merge when adding it would reach the configured bound; replace the current
check "if state.queued.coalesceCount+1 > r.coalesceMax" with a comparison that
prevents accepting the new request when the resulting count would be >=
coalesceMax (i.e. use >= logic), keeping the rest of the logic around
state.queued, mergeRequests, r.coalescedTurns and r.droppedTurns unchanged.
- Around line 485-495: The Drain loop can race with Enqueue/time.AfterFunc which
call r.wg.Add concurrently with r.wg.Wait; fix by introducing a draining boolean
(e.g., r.draining) guarded by the same mutex used in Enqueue and
scheduleThrottleFlushLocked, set r.draining = true at the start of Drain before
entering the loop and cancel/stop any pending throttle timers so no flushSession
callback can run, and update Enqueue, scheduleThrottleFlushLocked, and
flushSession to check r.draining under that mutex and reject/skip adding new
work or calling r.wg.Add when draining; this ensures no concurrent Add occurs
while waitWorkers (which calls r.wg.Wait) runs and prevents new work from
slipping past Drain.

In `@internal/network/delivery.go`:
- Around line 482-497: markGuidanceDelivered can recreate guidance entries after
dropSession clears them; change markGuidanceDelivered (in deliveryCoordinator)
to first check whether c.guidance contains the target key and return if it does
not, so you only update existing guidance state (do not create a new state
entry). This prevents finishDeliveredMessage -> markGuidanceDelivered from
repopulating c.guidance after dropSession(); use the existing map lookup for
state (the same c.guidance[target] access) and only assign back after confirming
the entry existed.

---

Nitpick comments:
In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py:
- Around line 23-27: Replace the hardcoded COMPACT_CATALOG_BYTES value with a
derived byte length computed from the compact text constant: compute
COMPACT_CATALOG_BYTES = len(COMPACT_GUIDANCE_TEXT.encode()) (or the appropriate
compact catalog text variable used elsewhere) so the byte count always reflects
the actual encoded size; also update the other occurrence that currently
hardcodes 301 (the second compact form referenced) to compute its length the
same way (use the exact compact-form string variable for len(...encode()) to
ensure consistency).

In `@internal/network/delivery_test.go`:
- Around line 521-575: Add a new subtest that reproduces the in-flight
regression by calling coordinator.dropSession("sess-guidance-drop") before
finishing the first prompter call: create a t.Run (e.g. "Should reset guidance
state when dropped while in-flight"), set up ctx/coordinator/prompter and accept
the first Delivery (as in the existing test), call prompter.waitForCalls(t, 1),
then immediately call coordinator.dropSession("sess-guidance-drop") before
prompter.finishCall(0, ...); accept the second Delivery and assert (using
prompter.waitForCalls and inspecting prompter.call(1).message) that the verbose
guidance snippets (the same list of snippets used in the existing test) are
present. Use the same identifiers (newDeliveryCoordinator, acceptOne,
dropSession, prompter.finishCall, prompter.waitForCalls, prompter.call) so the
new test mirrors the original but drops while the first delivery is in flight.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ed2bd76a-a7dc-4abf-bc2e-441471680b80

📥 Commits

Reviewing files that changed from the base of the PR and between 2e87293 and c423564.

⛔ Files ignored due to path filters (15)
  • .agents/skills/autoreview/SKILL.md is excluded by !**/*.md, !.agents/**
  • .agents/skills/autoreview/scripts/autoreview is excluded by !.agents/**
  • .agents/skills/autoreview/scripts/test-review-harness is excluded by !.agents/**
  • .compozy/tasks/tokens-perf/analysis/screenshots/task-007-settings-memory-default.png is excluded by !**/*.png
  • .github/workflows/release.yml is excluded by !**/*.yml
  • openapi/agh.json is excluded by !**/*.json
  • packages/site/content/runtime/core/configuration/config-toml.mdx is excluded by !**/*.mdx
  • packages/site/content/runtime/core/getting-started/installation.mdx is excluded by !**/*.mdx
  • packages/site/content/runtime/core/memory/system.mdx is excluded by !**/*.mdx
  • packages/site/content/runtime/core/skills/index.mdx is excluded by !**/*.mdx
  • skills-lock.json is excluded by !**/*.json
  • skills/agh/references/memory.md is excluded by !**/*.md
  • skills/agh/references/network.md is excluded by !**/*.md
  • skills/agh/references/tools-and-skills.md is excluded by !**/*.md
  • web/src/generated/agh-openapi.d.ts is excluded by !**/generated/**
📒 Files selected for processing (29)
  • .compozy/tasks/tokens-perf/analysis/collect_token_baseline.py
  • .compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py
  • .goreleaser.release-footer.md.tmpl
  • .goreleaser.release-header.md.tmpl
  • internal/api/contract/memory.go
  • internal/api/core/memory_services_test.go
  • internal/config/config.go
  • internal/config/memory_v2_config_test.go
  • internal/config/release_config_test.go
  • internal/config/tool_surface.go
  • internal/daemon/memory_runtime.go
  • internal/daemon/memory_runtime_test.go
  • internal/daemon/native_tools_test.go
  • internal/daemon/prompt_skills.go
  • internal/daemon/prompt_skills_test.go
  • internal/memory/extractor/events.go
  • internal/memory/extractor/runtime.go
  • internal/memory/extractor/runtime_test.go
  • internal/network/delivery.go
  • internal/network/delivery_test.go
  • internal/network/perf_bench_test.go
  • internal/session/spawn.go
  • internal/skills/catalog.go
  • packages/site/lib/__tests__/public-install-contract.test.ts
  • packages/site/public/install.sh
  • web/src/routes/_app/settings/memory.tsx
  • web/src/routes/_app/settings/stories/-memory.stories.tsx
  • web/src/systems/session/lib/session-history-adapter.ts
  • web/src/systems/session/lib/session-thread-repository.ts

Comment thread .compozy/tasks/tokens-perf/analysis/collect_token_baseline.py
Comment thread .compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py Outdated
Comment thread internal/api/core/memory_services_test.go
Comment thread internal/daemon/native_tools_test.go
Comment thread internal/daemon/prompt_skills.go
Comment thread internal/memory/extractor/runtime_test.go Outdated
Comment thread internal/memory/extractor/runtime.go Outdated
Comment thread internal/memory/extractor/runtime.go
Comment thread internal/network/delivery.go Outdated
@pedronauck pedronauck merged commit d4a8c92 into main May 27, 2026
9 of 10 checks passed
@pedronauck pedronauck deleted the optimizations branch May 27, 2026 16:43
pedronauck added a commit that referenced this pull request May 27, 2026
## Release v0.0.3

This PR prepares the release of version v0.0.3.

### Changelog

## 0.0.3 - 2026-05-27



### ♻️ Refactoring

- Memory optimization (#215)



### 📦 Build System

- Sync web assets module (#210)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated web assets dependency to the latest version.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/compozy/agh/pull/216?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant