Skip to content

Cache broadcast live query bitfields in memory#48047

Draft
juan-fdz-hawa wants to merge 1 commit into
42441-live-query-redis-re-architecture-set-on-small-liveqfrom
42441-live-query-redis-re-architecture-cache-bimap
Draft

Cache broadcast live query bitfields in memory#48047
juan-fdz-hawa wants to merge 1 commit into
42441-live-query-redis-re-architecture-set-on-small-liveqfrom
42441-live-query-redis-re-architecture-cache-bimap

Conversation

@juan-fdz-hawa

@juan-fdz-hawa juan-fdz-hawa commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Related issue: Relates to #42441

Cache each active broadcast query's targeting bitfield in the existing in-memory cache and test the host's bit locally on check-in, so a broadcast query no longer costs a Redis command per check-in. A query whose bitfield is not cached (e.g. it expired while the campaign is still active) falls back to a live GETBIT.

Checklist for submitter

If some of the following don't apply, delete the relevant line.

  • Changes file added for user-visible changes in changes/, orbit/changes/ or ee/fleetd-chrome/changes.
    See Changes files for more information.

Testing

  • Added/updated automated tests
  • QA'd all new/changed functionality manually

Summary by CodeRabbit

  • Bug Fixes

    • Fixed a concurrency-related performance issue where multiple live queries targeting the same hosts could significantly overload the system and delay host check-ins. The fix improves overall system responsiveness and stability.
  • Tests

    • Added comprehensive test coverage for concurrent query scenarios and cache reliability.

@juan-fdz-hawa

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai

coderabbitai Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The change reduces Redis load from concurrent live queries targeting small host sets by introducing an in-memory bitfield cache for broadcast query targeting. memCache gains a bitfields map keyed by campaign ID. A singleflight.Group on redisLiveQuery coalesces concurrent cache reloads. QueriesForHost now tests hosts against cached bitfields via a new testBit helper and only issues live Redis GETBIT calls for queries absent from the cache. loadCache is refactored to group target keys by Redis Cluster slot and pipeline GET calls for SQL and bitfields per slot through a new loadBatchIntoCache helper. Missing bitfields are deliberately left uncached to preserve the live GETBIT fallback path. Tests are added for bit correctness, cache eviction fallback, pipeline drain ordering, cache-served behavior without GETBIT, and concurrent reload coalescing. The change description entry is also updated to reflect the new behavior.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers the main objective and testing status, but is incomplete according to the template. Required sections like input validation, timeouts, database migrations, and full testing checklist are missing or incomplete. Complete the description by addressing all applicable template sections, particularly documenting any security considerations, timeout/retry logic, database impacts, and confirming manual QA completion before merge.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: implementing in-memory caching for broadcast live query targeting bitfields to reduce Redis load.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 42441-live-query-redis-re-architecture-cache-bimap

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/live_query/redis_live_query.go (1)

379-388: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Load the cache before checking out the reverse-query Redis connection.

Line 380 acquires a Redis connection before Line 387 can enter ensureCacheLoaded; if the cache is expired, waiters can hold pool connections while the singleflight leader needs additional Redis connections in loadCache, risking pool starvation during check-in bursts. Move the cache load ahead of ReadOnlyConn.

Proposed fix
 func (r *redisLiveQuery) collectReverseQueriesForHost(hostID uint, queriesByHost map[string]string) error {
-	conn := redis.ReadOnlyConn(r.pool, r.pool.Get())
-	defer conn.Close()
-
 	// Stale-entry filtering below relies on the SQL cache holding only active
 	// queries. Refresh it on expiry here so this path stays correct on its own,
 	// independent of any cache (re)load done by the caller or the bitfield path
 	// (which is skipped when every active query is small-target).
 	if err := r.ensureCacheLoaded(); err != nil {
 		return fmt.Errorf("load cache: %w", err)
 	}
+
+	conn := redis.ReadOnlyConn(r.pool, r.pool.Get())
+	defer conn.Close()
 
 	names, err := redigo.Strings(conn.Do("SMEMBERS", reverseHostKey(hostID)))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/live_query/redis_live_query.go` around lines 379 - 388, The Redis
connection is being acquired via redis.ReadOnlyConn before the cache is loaded
via ensureCacheLoaded in the collectReverseQueriesForHost function. This creates
a risk of pool starvation if the cache is expired and other waiters are holding
connections while the singleflight leader needs additional connections in
loadCache. Move the ensureCacheLoaded call to execute at the start of the
function, before the Redis connection acquisition, so the cache is guaranteed to
be loaded before any pool connections are checked out.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@server/live_query/redis_live_query.go`:
- Around line 379-388: The Redis connection is being acquired via
redis.ReadOnlyConn before the cache is loaded via ensureCacheLoaded in the
collectReverseQueriesForHost function. This creates a risk of pool starvation if
the cache is expired and other waiters are holding connections while the
singleflight leader needs additional connections in loadCache. Move the
ensureCacheLoaded call to execute at the start of the function, before the Redis
connection acquisition, so the cache is guaranteed to be loaded before any pool
connections are checked out.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 086ab503-f5d7-436a-8f7c-7dbbdb00f99f

📥 Commits

Reviewing files that changed from the base of the PR and between 59db864 and 311bb94.

📒 Files selected for processing (3)
  • changes/42441-live-query-redis-scaling
  • server/live_query/redis_live_query.go
  • server/live_query/redis_live_query_test.go

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 69.76744% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.31%. Comparing base (59db864) to head (dd8e213).

Files with missing lines Patch % Lines
server/live_query/redis_live_query.go 69.76% 14 Missing and 12 partials ⚠️
Additional details and impacted files
@@                                    Coverage Diff                                     @@
##           42441-live-query-redis-re-architecture-set-on-small-liveq   #48047   +/-   ##
==========================================================================================
  Coverage                                                      67.30%   67.31%           
==========================================================================================
  Files                                                           3655     3655           
  Lines                                                         231336   231393   +57     
  Branches                                                       12075    12075           
==========================================================================================
+ Hits                                                          155711   155762   +51     
- Misses                                                         61642    61645    +3     
- Partials                                                       13983    13986    +3     
Flag Coverage Δ
backend 68.94% <69.76%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Relates to #42441

Cache each active broadcast query's targeting bitfield in the existing
in-memory cache and test the host's bit locally on check-in, so a
broadcast query no longer costs a Redis command per check-in. A query
whose bitfield is not cached (e.g. it expired while the campaign is still
active) falls back to a live GETBIT.
@juan-fdz-hawa juan-fdz-hawa force-pushed the 42441-live-query-redis-re-architecture-cache-bimap branch from 311bb94 to dd8e213 Compare June 22, 2026 20:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Live query Redis architecture causes unsustainable O(hosts × queries) command load at scale

1 participant