refactor: memory optimization#215
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Caution Review failedPull request was closed or merged during review No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (11)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (8)
WalkthroughThis PR introduces extractor backpressure and throttling with new runtime metrics, implements skills-catalog caching/compact markers, compacts per-session network guidance, bootstraps a pinned cosign verifier in the installer (with tests), and adds two Python token-analysis scripts plus supporting tests and frontend/type tweaks. ChangesMemory Extractor Runtime Backpressure and Metrics
Skills Catalog Caching and Network Delivery Guidance Compaction
Installation Script and Release Verification
Token Performance Analysis and Cost Validation
Web Frontend Type Refinements
🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (2)
internal/network/delivery_test.go (1)
521-575: ⚡ Quick winAdd a drop-while-in-flight regression case.
This reset test drops the session only after the first delivery is finished, so it misses the in-flight race where guidance state can be reinserted after drop. Add a case that calls
dropSession()beforefinishCall()and then verifies the next delivery is verbose.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/network/delivery_test.go` around lines 521 - 575, Add a new subtest that reproduces the in-flight regression by calling coordinator.dropSession("sess-guidance-drop") before finishing the first prompter call: create a t.Run (e.g. "Should reset guidance state when dropped while in-flight"), set up ctx/coordinator/prompter and accept the first Delivery (as in the existing test), call prompter.waitForCalls(t, 1), then immediately call coordinator.dropSession("sess-guidance-drop") before prompter.finishCall(0, ...); accept the second Delivery and assert (using prompter.waitForCalls and inspecting prompter.call(1).message) that the verbose guidance snippets (the same list of snippets used in the existing test) are present. Use the same identifiers (newDeliveryCoordinator, acceptOne, dropSession, prompter.finishCall, prompter.waitForCalls, prompter.call) so the new test mirrors the original but drops while the first delivery is in flight..compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py (1)
23-27: ⚡ Quick winDerive the compact catalog size instead of hardcoding
301.The savings math depends on this exact byte count. If the compact "catalog unchanged" marker changes, the script will keep reporting stale savings with no signal. Keep the compact form as text here and compute
len(...encode())from it instead of relying on a magic number.Also applies to: 186-191
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py around lines 23 - 27, Replace the hardcoded COMPACT_CATALOG_BYTES value with a derived byte length computed from the compact text constant: compute COMPACT_CATALOG_BYTES = len(COMPACT_GUIDANCE_TEXT.encode()) (or the appropriate compact catalog text variable used elsewhere) so the byte count always reflects the actual encoded size; also update the other occurrence that currently hardcodes 301 (the second compact form referenced) to compute its length the same way (use the exact compact-form string variable for len(...encode()) to ensure consistency).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.compozy/tasks/tokens-perf/analysis/collect_token_baseline.py:
- Around line 146-167: The assistant usage counting block currently tallies
every transcript row regardless of the requested UTC window; update
collect_claude()/the loop that builds stats (the block using stats, raw_rows,
dedupe_key, TOKEN_FIELDS, minute_bucket, calls_by_minute, cache_read_by_minute,
session_starts_in_window_by_class, top_files_by_cache_read, etc.) to first
compute/parse the row timestamp and skip the row if timestamp < start or
timestamp >= end (the same window logic used for
session_starts_in_window_by_class); only after passing that time-window check
should you increment stats.raw_assistant_usage_rows, raw_rows, add to
seen/deduped counts, accumulate TOKEN_FIELDS into stats.usage and totals, update
calls_by_minute/cache_read_by_minute, and update
by_class/top_files_by_cache_read so all counters remain confined to the
requested UTC window.
In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py:
- Line 14: Replace the hardcoded ROOT pathlib.Path assignment with a
configurable approach: read the corpus path from an environment variable (e.g.,
CORPUS_ROOT) or a CLI argument (use argparse) and fall back to a sensible
default; update the ROOT variable and any code that references it (search for
ROOT in validate_task010_token_cost.py and the other occurrences around the
blocks mentioned: lines ~136-156 and ~232-233) to use the parsed/validated path,
and add a clear error message if the provided path does not exist or is not a
directory.
In `@internal/api/core/memory_services_test.go`:
- Around line 25-34: Add assertions to verify ActiveProviderSessions is
correctly serialized in the test: in memory_services_test.go locate the
assertions on statusPayload.Extractor and add checks that
statusPayload.Extractor.ActiveProviderSessions equals the expected fixture value
(the value set in the fixture, e.g., 1). Do this for both occurrences mentioned
(around the existing assertions at lines ~25-34 and ~61-66) so the test fails if
ActiveProviderSessions is not mapped/serialized correctly.
In `@internal/daemon/native_tools_test.go`:
- Around line 3885-3887: The test is missing an assertion for
ActiveProviderSessions; update the test that calls
requireNativeStructuredContains with extractorResult to also assert the field is
rendered by adding a check for `"active_provider_sessions":1` (i.e., add another
requireNativeStructuredContains(t, extractorResult,
[]byte(`"active_provider_sessions":1`))) alongside the existing assertions that
reference requireNativeStructuredContains and extractorResult.
In `@internal/daemon/prompt_skills.go`:
- Line 17: Replace the hardcoded constant promptSkillsCatalogCacheMaxSessions
with a configurable value: add a configuration option (e.g.,
PromptSkillsCatalogCacheMaxSessions) to the daemon/app config or an options
struct used to construct the component that uses the constant, load the value
from the config/env at startup, and update all references to use the injected
config field instead of the literal constant in internal/daemon/prompt_skills.go
(and any constructor like NewPromptSkillsCatalog or similar). Validate/normalize
the config value (provide a sensible default if not set) and update any tests to
provide the option via the constructor or test config.
In `@internal/memory/extractor/runtime_test.go`:
- Around line 936-947: The test's timeout is too short and causes flakes:
increase the deadline duration (replace time.After(time.Second)) to a longer
value (e.g., 5s or 10s) so the loop that polls runtime.Stats() with tick (ticker
:= time.NewTicker(10 * time.Millisecond) / tick variable) has more time to
observe matching stats before the select hits the deadline case that calls
t.Fatalf; update the deadline variable initialization accordingly and keep the
existing tick behavior and defer tick.Stop().
In `@internal/memory/extractor/runtime.go`:
- Around line 394-410: The off-by-one: change the drop condition so we reject a
new merge when adding it would reach the configured bound; replace the current
check "if state.queued.coalesceCount+1 > r.coalesceMax" with a comparison that
prevents accepting the new request when the resulting count would be >=
coalesceMax (i.e. use >= logic), keeping the rest of the logic around
state.queued, mergeRequests, r.coalescedTurns and r.droppedTurns unchanged.
- Around line 485-495: The Drain loop can race with Enqueue/time.AfterFunc which
call r.wg.Add concurrently with r.wg.Wait; fix by introducing a draining boolean
(e.g., r.draining) guarded by the same mutex used in Enqueue and
scheduleThrottleFlushLocked, set r.draining = true at the start of Drain before
entering the loop and cancel/stop any pending throttle timers so no flushSession
callback can run, and update Enqueue, scheduleThrottleFlushLocked, and
flushSession to check r.draining under that mutex and reject/skip adding new
work or calling r.wg.Add when draining; this ensures no concurrent Add occurs
while waitWorkers (which calls r.wg.Wait) runs and prevents new work from
slipping past Drain.
In `@internal/network/delivery.go`:
- Around line 482-497: markGuidanceDelivered can recreate guidance entries after
dropSession clears them; change markGuidanceDelivered (in deliveryCoordinator)
to first check whether c.guidance contains the target key and return if it does
not, so you only update existing guidance state (do not create a new state
entry). This prevents finishDeliveredMessage -> markGuidanceDelivered from
repopulating c.guidance after dropSession(); use the existing map lookup for
state (the same c.guidance[target] access) and only assign back after confirming
the entry existed.
---
Nitpick comments:
In @.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py:
- Around line 23-27: Replace the hardcoded COMPACT_CATALOG_BYTES value with a
derived byte length computed from the compact text constant: compute
COMPACT_CATALOG_BYTES = len(COMPACT_GUIDANCE_TEXT.encode()) (or the appropriate
compact catalog text variable used elsewhere) so the byte count always reflects
the actual encoded size; also update the other occurrence that currently
hardcodes 301 (the second compact form referenced) to compute its length the
same way (use the exact compact-form string variable for len(...encode()) to
ensure consistency).
In `@internal/network/delivery_test.go`:
- Around line 521-575: Add a new subtest that reproduces the in-flight
regression by calling coordinator.dropSession("sess-guidance-drop") before
finishing the first prompter call: create a t.Run (e.g. "Should reset guidance
state when dropped while in-flight"), set up ctx/coordinator/prompter and accept
the first Delivery (as in the existing test), call prompter.waitForCalls(t, 1),
then immediately call coordinator.dropSession("sess-guidance-drop") before
prompter.finishCall(0, ...); accept the second Delivery and assert (using
prompter.waitForCalls and inspecting prompter.call(1).message) that the verbose
guidance snippets (the same list of snippets used in the existing test) are
present. Use the same identifiers (newDeliveryCoordinator, acceptOne,
dropSession, prompter.finishCall, prompter.waitForCalls, prompter.call) so the
new test mirrors the original but drops while the first delivery is in flight.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ed2bd76a-a7dc-4abf-bc2e-441471680b80
⛔ Files ignored due to path filters (15)
.agents/skills/autoreview/SKILL.mdis excluded by!**/*.md,!.agents/**.agents/skills/autoreview/scripts/autoreviewis excluded by!.agents/**.agents/skills/autoreview/scripts/test-review-harnessis excluded by!.agents/**.compozy/tasks/tokens-perf/analysis/screenshots/task-007-settings-memory-default.pngis excluded by!**/*.png.github/workflows/release.ymlis excluded by!**/*.ymlopenapi/agh.jsonis excluded by!**/*.jsonpackages/site/content/runtime/core/configuration/config-toml.mdxis excluded by!**/*.mdxpackages/site/content/runtime/core/getting-started/installation.mdxis excluded by!**/*.mdxpackages/site/content/runtime/core/memory/system.mdxis excluded by!**/*.mdxpackages/site/content/runtime/core/skills/index.mdxis excluded by!**/*.mdxskills-lock.jsonis excluded by!**/*.jsonskills/agh/references/memory.mdis excluded by!**/*.mdskills/agh/references/network.mdis excluded by!**/*.mdskills/agh/references/tools-and-skills.mdis excluded by!**/*.mdweb/src/generated/agh-openapi.d.tsis excluded by!**/generated/**
📒 Files selected for processing (29)
.compozy/tasks/tokens-perf/analysis/collect_token_baseline.py.compozy/tasks/tokens-perf/analysis/validate_task010_token_cost.py.goreleaser.release-footer.md.tmpl.goreleaser.release-header.md.tmplinternal/api/contract/memory.gointernal/api/core/memory_services_test.gointernal/config/config.gointernal/config/memory_v2_config_test.gointernal/config/release_config_test.gointernal/config/tool_surface.gointernal/daemon/memory_runtime.gointernal/daemon/memory_runtime_test.gointernal/daemon/native_tools_test.gointernal/daemon/prompt_skills.gointernal/daemon/prompt_skills_test.gointernal/memory/extractor/events.gointernal/memory/extractor/runtime.gointernal/memory/extractor/runtime_test.gointernal/network/delivery.gointernal/network/delivery_test.gointernal/network/perf_bench_test.gointernal/session/spawn.gointernal/skills/catalog.gopackages/site/lib/__tests__/public-install-contract.test.tspackages/site/public/install.shweb/src/routes/_app/settings/memory.tsxweb/src/routes/_app/settings/stories/-memory.stories.tsxweb/src/systems/session/lib/session-history-adapter.tsweb/src/systems/session/lib/session-thread-repository.ts
## Release v0.0.3 This PR prepares the release of version v0.0.3. ### Changelog ## 0.0.3 - 2026-05-27 ### ♻️ Refactoring - Memory optimization (#215) ### 📦 Build System - Sync web assets module (#210) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Updated web assets dependency to the latest version. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/compozy/agh/pull/216?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary by CodeRabbit
New Features
Improvements