Skip to content

fix: api memory improvement#2762

Merged
baktun14 merged 6 commits intomainfrom
fix/api-memory-leaks
Feb 24, 2026
Merged

fix: api memory improvement#2762
baktun14 merged 6 commits intomainfrom
fix/api-memory-leaks

Conversation

@baktun14
Copy link
Copy Markdown
Contributor

@baktun14 baktun14 commented Feb 17, 2026

Why

The memory-cache doesn't have any limit

What

Replace with LRUCache

Summary by CodeRabbit

  • Improvements

    • Switched to bounded LRU caching across the API for better memory use and predictable eviction.
    • Adjusted health-check/readiness logic for clearer success tracking and stricter initial-failure behavior.
  • Chores

    • Removed legacy in-memory cache dependency and migrated caching helpers and usages to the new implementation.
  • Tests

    • Added comprehensive cache unit tests and updated test cleanup to use the new caching helpers.

@baktun14 baktun14 requested a review from a team as a code owner February 17, 2026 18:13
@baktun14 baktun14 changed the title Fix/api memory leaks fix: api memory improvement Feb 17, 2026
Comment thread package-lock.json
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 17, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e635033 and 5da1d52.

📒 Files selected for processing (1)
  • apps/api/src/healthz/services/healthz/healthz.service.spec.ts

📝 Walkthrough

Walkthrough

Replaces the in-repo memory-cache usage with lru-cache across the API app: dependency removal, MemoryCacheEngine and helpers switched to LRUCache (with signature changes and a new clearByKey), service caches migrated to LRU, functional tests updated to use cache helper, and healthcheck logic updated to track prior success with a new hasSucceeded flag. (47 words)

Changes

Cohort / File(s) Summary
Dependency Update
apps/api/package.json
Removed memory-cache and @types/memory-cache from dependencies/devDependencies.
Cache Engine & Helpers
apps/api/src/caching/memoryCacheEngine.ts, apps/api/src/caching/helpers.ts
Replaced memory-cache with lru-cache; introduced CacheValue type and global LRU instances; getFromCache<T> now returns `T
Service Cache Migration
apps/api/src/deployment/services/cached-balance/cached-balance.service.ts
Replaced internal Map<string, CachedBalance> with LRUCache<string, CachedBalance>({ max: 10000 }) (implementation change only).
Healthcheck Logic
apps/api/src/healthz/services/healthz/healthz.service.ts
Added private hasSucceeded flag and revised readiness logic: first-failure tolerance now depends on prior hasSucceeded instead of nullable isFailed; isFailed behavior adjusted accordingly.
Auth Cache Typing
apps/api/src/auth/services/user-auth-token/user-auth-token.service.ts
Updated cache typing to use CacheValue for kvStore.put calls (type refinement only).
Tests — Cache Cleanup
apps/api/test/functional/provider-deployments.spec.ts, apps/api/test/functional/provider-earnings.spec.ts, apps/api/test/functional/providers.spec.ts
Replaced direct memory-cache usage with cacheEngine helper and switched teardown mcache.clear() to cacheEngine.clearAllKeyInCache().
New Tests
apps/api/src/caching/memoryCacheEngine.spec.ts
Added comprehensive unit tests covering get/store/TTL/clear/clearByPrefix/getKeys behaviors for the MemoryCacheEngine backed by LRU.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰
I swapped the crumbs for tidy bins,
LRU hums where hoarding ends.
Keys pruned short, old maps set free,
A new flag watches first victory.
Hop, cache — neat memory, please and thanks. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'fix: api memory improvement' is vague and doesn't clearly describe the specific change (replacing memory-cache with LRUCache). Make the title more specific, e.g., 'fix: replace unbounded memory-cache with LRUCache' to clearly convey the main change.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description addresses both 'Why' and 'What' sections with relevant information about the memory limitation issue and the solution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/api-memory-leaks

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 53.96%. Comparing base (fd91ace) to head (5da1d52).
⚠️ Report is 21 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2762      +/-   ##
==========================================
+ Coverage   53.64%   53.96%   +0.32%     
==========================================
  Files        1019     1020       +1     
  Lines       23591    23608      +17     
  Branches     5754     5771      +17     
==========================================
+ Hits        12655    12741      +86     
+ Misses       9531     9476      -55     
+ Partials     1405     1391      -14     
Flag Coverage Δ
api 76.92% <100.00%> (-0.01%) ⬇️
deploy-web 36.38% <ø> (+0.45%) ⬆️
log-collector 75.35% <ø> (ø)
notifications 85.56% <ø> (ø)
provider-console 81.48% <ø> (ø)
provider-proxy 85.93% <ø> (+3.52%) ⬆️
tx-signer 79.29% <ø> (ø)
Files with missing lines Coverage Δ
...ervices/user-auth-token/user-auth-token.service.ts 82.60% <100.00%> (ø)
apps/api/src/caching/helpers.ts 100.00% <100.00%> (ø)
apps/api/src/caching/memoryCacheEngine.ts 100.00% <100.00%> (+45.00%) ⬆️
.../services/cached-balance/cached-balance.service.ts 100.00% <100.00%> (ø)
...pi/src/healthz/services/healthz/healthz.service.ts 100.00% <100.00%> (ø)

... and 34 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/caching/memoryCacheEngine.ts`:
- Around line 3-4: Replace the explicit any used in the LRUCache instantiation
with unknown and remove the eslint-disable comment: change the cache declaration
from LRUCache<string, any> to LRUCache<string, unknown> so the MemoryCacheEngine
wrapper’s generic get/set methods maintain type safety without suppressing the
`@typescript-eslint/no-explicit-any` rule (look for the cache variable and
MemoryCacheEngine usages to update).
- Around line 25-26: The TTL unit is mismatched: make storeInCache treat its
duration parameter as seconds (to match cacheResponse callers) and convert to
milliseconds before passing to lru-cache; inside storeInCache(key: string, data:
T, duration?: number) compute const ttl = duration ? duration * 1000 : undefined
and call cache.set(key, data, ttl ? { ttl } : undefined). Update any references
to storeInCache (and leave cacheResponse(seconds) calls unchanged) so callers
can continue passing seconds without breaking behavior.

Comment thread apps/api/src/caching/memoryCacheEngine.ts Outdated
Comment thread apps/api/src/caching/memoryCacheEngine.ts Outdated
@ygrishajev
Copy link
Copy Markdown
Contributor

although this is not introducing radical changes I think the cache engine should be reworked as an interface utilising various storages. E.g. in local dev LRUCache is totally find, in prod we should use a distributed storage like redis.

@stalniy wdyt?

Comment thread apps/api/src/caching/memoryCacheEngine.ts Outdated
Comment thread apps/api/src/caching/memoryCacheEngine.ts Outdated
@stalniy
Copy link
Copy Markdown
Contributor

stalniy commented Feb 18, 2026

although this is not introducing radical changes I think the cache engine should be reworked as an interface utilising various storages. E.g. in local dev LRUCache is totally find, in prod we should use a distributed storage like redis.

@stalniy wdyt?

I actually like this one because it's minus 1 dependency. And this is the thing which I wanted to do by myself for a long time :)

I think that we should make a discussion for this and consider other options like https://bentocache.dev/docs/introduction

baktun14 and others added 2 commits February 19, 2026 11:35
The healthz service was returning unhealthy immediately on the first
failed check after startup (when isFailed was null), bypassing the
failure tolerance logic. This could cause liveness probe failures
during pod startup when DB connections aren't yet established.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@baktun14 baktun14 force-pushed the fix/api-memory-leaks branch from f9d164b to 238392c Compare February 19, 2026 16:36
Replace `any` with `NonNullable<unknown>` cache value type, change
`getFromCache` sentinel from `false` to `undefined`, and convert
TTL duration from seconds to milliseconds for lru-cache compatibility.
@github-actions github-actions Bot added size: S and removed size: XS labels Feb 19, 2026
@github-actions github-actions Bot added size: M and removed size: S labels Feb 19, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/api/src/caching/memoryCacheEngine.spec.ts`:
- Around line 169-173: The setup function currently has no parameters; change
its signature to accept a single destructured empty object with an inline type
to satisfy the test guideline (e.g., function setup({}: {}) { ... }), keep the
body as-is (new MemoryCacheEngine(), engine.clearAllKeyInCache(), return {
engine }) and ensure this setup remains at the bottom of the root describe
block; reference: setup, MemoryCacheEngine, engine.clearAllKeyInCache.

Comment thread apps/api/src/caching/memoryCacheEngine.spec.ts
Track whether a healthcheck has succeeded at least once before
tolerating failures, preventing false-positive liveness on cold start.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/api/src/healthz/services/healthz/healthz.service.ts (1)

62-99: ⚠️ Potential issue | 🟠 Major

Clear checkedAt on first-time failure to prevent false-positive liveness.

The check() method always updates checkedAt in its finally block (line 105), even when the ping fails. This means a failed first call still caches a timestamp; subsequent isHealthy() calls within cacheTTL skip the check and return true, yielding false-positive liveness during cold start. The hasSucceeded guard on line 95 correctly returns false on initial failure, but the cached timestamp bypasses this guard on subsequent calls within the TTL window.

🛠️ Minimal fix to avoid caching failures before first success
      const prevIsFailed = this.isFailed;
+      if (!this.hasSucceeded) {
+        this.checkedAt = null;
+        return false;
+      }
-      if (prevIsFailed || !this.hasSucceeded || options?.ignoreCache) return false;
+      if (prevIsFailed || options?.ignoreCache) return false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/api/src/healthz/services/healthz/healthz.service.ts` around lines 62 -
99, The Healthcheck's checkedAt is being set even when check() fails, causing a
failed first ping to be cached and producing false-positive liveness; update the
logic so checkedAt is only recorded after a successful ping (or explicitly clear
checkedAt on first-time failure). Specifically, modify
Healthcheck.check()/isHealthy() behavior so that checkedAt is not updated in the
finally block when the ping throws, and if a ping fails while hasSucceeded is
false then set checkedAt = null (or move the checkedAt update into the success
path) so subsequent isHealthy() calls within cacheTTL will re-run check()
instead of returning a cached true.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/api/src/healthz/services/healthz/healthz.service.ts`:
- Around line 62-99: The Healthcheck's checkedAt is being set even when check()
fails, causing a failed first ping to be cached and producing false-positive
liveness; update the logic so checkedAt is only recorded after a successful ping
(or explicitly clear checkedAt on first-time failure). Specifically, modify
Healthcheck.check()/isHealthy() behavior so that checkedAt is not updated in the
finally block when the ping throws, and if a ping fails while hasSucceeded is
false then set checkedAt = null (or move the checkedAt update into the success
path) so subsequent isHealthy() calls within cacheTTL will re-run check()
instead of returning a cached true.

@baktun14 baktun14 enabled auto-merge February 24, 2026 17:03
@baktun14 baktun14 added this pull request to the merge queue Feb 24, 2026
Merged via the queue into main with commit 7792ab8 Feb 24, 2026
54 of 55 checks passed
@baktun14 baktun14 deleted the fix/api-memory-leaks branch February 24, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants