Skip to content

feat(controlplane): optimize usage API with Redis cache and combined query#2594

Closed
mekilis wants to merge 5 commits intomainfrom
feat/optimize-billing-usage
Closed

feat(controlplane): optimize usage API with Redis cache and combined query#2594
mekilis wants to merge 5 commits intomainfrom
feat/optimize-billing-usage

Conversation

@mekilis
Copy link
Copy Markdown
Collaborator

@mekilis mekilis commented Mar 5, 2026

Summary

Optimize the billing usage API for better performance.

Changes

  • Redis cache-aside for GetUsage (10min TTL, key: usage:{orgID}:{period})
  • Single combined query (CalculateUsageCombined) instead of 4 parallel queries
  • Async cache refresh on hit; singleflight prevents duplicate concurrent populates
  • Add TestFormatUsageResponse; revert-concurrently.sh for sqlc workflow

Testing

  • go test ./api/handlers/... passes
  • golangci-lint run passes

@mekilis mekilis enabled auto-merge March 6, 2026 08:40
@mekilis mekilis force-pushed the feat/optimize-billing-usage branch from e4f54da to 1bf5036 Compare March 10, 2026 08:48
Comment thread api/handlers/billing.go Outdated
* feat(events): add baseline pagination tests for sqlc migration

- Create 8 critical test cases for LoadEventsPaged
- Cover EXISTS path, CTE path, pagination, and filters
- Add migration tracking document
- Tests lock in expected behavior before refactoring

* feat(events): setup infrastructure for sqlc migration (Phase 1)

- Create internal/events/ directory structure
- Add events configuration to sqlc.yaml
- Create queries.sql with 19 query TODOs organized into 5 groups
- Add impl.go skeleton with 18 method signatures
- Add helpers.go for type conversions
- Add README.md with usage documentation
- Update .gitignore to exclude generated repo/ code

Phase 1 complete: Ready for SQL query implementation

* feat(events): implement all 19 SQL queries for sqlc (Phase 2)

Implemented all query groups:
- Group 1: Simple CRUD (5 queries)
  CreateEvent, CreateEventEndpoints, UpdateEventEndpoints, UpdateEventStatus, FindEventByID

- Group 2: Batch Reads & Counting (5 queries)
  FindEventsByIDs, FindEventsByIdempotencyKey, FindFirstEventWithIdempotencyKey,
  CountProjectMessages, CountEvents

- Group 3: Complex Pagination (5 queries) ⚠️ CRITICAL
  LoadEventsPagedExists (fast path with EXISTS for index usage)
  LoadEventsPagedSearch (CTE+JOIN for full-text search)
  CountPrevEventsExists, CountPrevEventsSearch
  - Supports 10+ filters with CASE expressions
  - Bidirectional pagination (forward/backward)
  - Cursor-based navigation
  - Dual query path (EXISTS vs CTE) for optimal performance

- Group 4: Deletion & Maintenance (4 queries)
  SoftDeleteProjectEvents, HardDeleteProjectEvents,
  HardDeleteTokenizedEvents, CopyRowsFromEventsToEventsSearch

- Group 5: Partition Management (4 queries)
  All partition operations call existing PL/pgSQL functions

Key implementation details:
- CASE expressions for conditional filters
- EXISTS subquery in pagination to leverage indexes
- CTE pattern for full-text search
- Source metadata via LEFT JOIN
- COALESCE for nullable fields

Phase 2 complete: Ready for service implementation

* feat(events): implement complete service layer (Phase 3)

Implemented all 18 EventRepository methods:

Core Implementation:
- impl.go: 523 lines with full service logic
- helpers.go: Type conversion utilities
- repo/.gitignore: Marks directory for sqlc-generated code

Method Groups:
1. Simple CRUD (5 methods)
   - CreateEvent with batch endpoint processing (30K partitions)
   - FindEventByID, UpdateEventEndpoints, UpdateEventStatus

2. Batch Reads & Counting (5 methods)
   - FindEventsByIDs, FindEventsByIdempotencyKey
   - CountProjectMessages, CountEvents

3. Complex Pagination (1 method, most critical)
   - LoadEventsPaged with dual query path logic
   - loadEventsPagedExists: Fast path using EXISTS subquery
   - loadEventsPagedSearch: Full-text search using CTE
   - countPrevEvents: Previous page detection for both paths
   - Supports 10+ filters with boolean flags
   - Bidirectional pagination (forward/backward)
   - Cursor-based navigation with ASC/DESC sort

4. Deletion & Maintenance (3 methods)
   - DeleteProjectEvents (soft/hard delete)
   - DeleteProjectTokenizedEvents
   - CopyRows with transaction handling

5. Partition Management (4 methods)
   - All partition operations implemented

Key Features:
- Transaction support with pgx.Tx
- Batch endpoint processing (30K partition size)
- Type conversions between pgtype and datastore types
- Dual query path for optimal performance
- CASE expression parameter building
- Proper error handling (ErrEventNotFound)

Helper Functions:
- rowToEvent: Converts all row types to datastore.Event
- endpointsToString/parseEndpoints: Array conversion
- headersToJSONB/parseHeaders: JSONB conversion
- getCreatedDateFilter: Unix timestamp conversion

Note: repo/ contains stub sqlc files. Run 'sqlc generate' to generate real code.
Phase 3 complete: Ready for dependency updates

* docs(events): update migration.md with Phase 3 completion

- Mark Phase 0, 1, 2, 3 as completed
- Document all 19 queries implemented
- Document all 18 methods implemented
- Add git commit hashes for tracking
- Update status indicators

* feat: migrate events repository to sqlc implementation

Migrated database/postgres/event.go (1,380 lines, 18 methods) from manual
SQL to type-safe sqlc-based implementation in internal/events/ package.

Implementation Details:
- impl.go: 523 lines (18 EventRepository interface methods)
- queries.sql: 19 optimized SQL queries with CASE expressions
- helpers.go: 162 lines (type conversion utilities)
- repo/: sqlc-generated stub files

Query Optimizations:
- Consolidated 14+ query variants into 19 unified queries
- Dual query path: EXISTS (fast) vs CTE+JOIN (search)
- CASE expressions for conditional filters
- Proper index usage via EXISTS subqueries
- Bidirectional pagination with cursor logic

Technical Improvements:
- Migrated from sqlx to pgx/v5 for better type safety
- Transaction context preservation with pgx.Tx
- Batch processing maintained (30K partition size)
- All pgtype conversions handled properly

Integration:
Updated 21 files across codebase (8 production, 13 tests)
Pattern: postgres.NewEventRepo(db) → events.New(logger, db.GetConn())

Code Quality:
- Passes go vet ./...
- Passes gofmt -s -l .
- Passes golangci-lint ./internal/events/...
- Compiles successfully

Testing Status:
⚠️  IMPORTANT: Tests pending sqlc code generation

Current state uses stub files with panic() implementations.
Before deployment:
1. Run `sqlc generate` with database connection
2. Execute baseline pagination tests
3. Run integration and E2E test suites
4. Verify zero regressions
5. Delete database/postgres/event.go after verification

* feat: migrate events repository to sqlc implementation

Migrated database/postgres/event.go (1,380 lines, 18 methods) from manual
SQL to type-safe sqlc-based implementation in internal/events/ package.

Implementation Details:
- impl.go: 523 lines (18 EventRepository interface methods)
- queries.sql: 19 optimized SQL queries with CASE expressions
- helpers.go: 162 lines (type conversion utilities)
- repo/: sqlc-generated stub files

Query Optimizations:
- Consolidated 14+ query variants into 19 unified queries
- Dual query path: EXISTS (fast) vs CTE+JOIN (search)
- CASE expressions for conditional filters
- Proper index usage via EXISTS subqueries
- Bidirectional pagination with cursor logic

Technical Improvements:
- Migrated from sqlx to pgx/v5 for better type safety
- Transaction context preservation with pgx.Tx
- Batch processing maintained (30K partition size)
- All pgtype conversions handled properly

Integration:
Updated 21 files across codebase (8 production, 13 tests)
Pattern: postgres.NewEventRepo(db) → events.New(logger, db.GetConn())

Code Quality:
- Passes go vet ./...
- Passes gofmt -s -l .
- Passes golangci-lint ./internal/events/...
- Compiles successfully

Testing Status:
⚠️  IMPORTANT: Tests pending sqlc code generation

Current state uses stub files with panic() implementations.
Before deployment:
1. Run `sqlc generate` with database connection
2. Execute baseline pagination tests
3. Run integration and E2E test suites
4. Verify zero regressions
5. Delete database/postgres/event.go after verification

* refactor(events): update `events.New` to use `database.Database` instead of `pgxpool.Pool` across codebase

- Replaced `db.GetConn()` with `db` for all `events.New` calls.
- Updated `events` service layer to reflect `database.Database` changes.
- Refactored 45+ files for consistency, including tests and production code.
- Ensured transaction context and type safety are preserved.

* refactor(events): replace `database.Database` with `pgxpool.Pool` in service layer

- Updated `Service` struct to use `*pgxpool.Pool` directly.
- Simplified transaction handling by replacing `db.GetConn().Begin` with `db.Begin`.

* feat(events): complete Phase 1-3 of sqlc migration

Phase 1: Preparation & Infrastructure
- Created internal/events directory structure
- Updated sqlc.yaml: ALL packages use URI instead of managed:true
- This fixes CREATE INDEX CONCURRENTLY error across all packages
- Created migration tracking document

Phase 2: Query Migration
- Wrote 19 SQL queries with named parameters (@param_name syntax)
- Converted 150+ positional params to semantic names
- Group 1: Simple CRUD (5 queries)
- Group 2: Batch Reads & Counting (5 queries)
- Group 3: Complex Pagination (5 queries) - dual path (EXISTS/Search)
- Group 4: Deletion & Maintenance (4 queries)
- Group 5: Partition Management (commented out - needs implementation)

Phase 3: Service Implementation
- Implemented 14/18 methods (78% complete)
- Created helpers.go with pgtype conversion utilities
- Implemented complex LoadEventsPaged with dual query paths
- Transaction handling with repo.New(tx) pattern
- All pgtype conversions using common package helpers

Status: ✅ Code compiles successfully
Next: Phase 4 (Integration) - Update 26 dependent files

* feat(events): implement partition functions - complete Phase 3

Phase 3 now 100% complete (except ExportRecords which is deferred)

Partition Functions Implementation:
- Added 4 SQL constants (~300 lines total) to impl.go
- partitionEventsTableSQL - Creates partitioned events table
- unPartitionEventsTableSQL - Reverts to non-partitioned
- partitionEventsSearchTableSQL - Partitions events_search table
- unPartitionEventsSearchTableSQL - Reverts events_search

Implementation Details:
- Each constant defines PL/pgSQL function and executes it
- Handles: table creation, partitioning, data migration, index recreation
- FK management using triggers for partitioned tables
- Methods execute via s.db.Exec(ctx, sql)

Status:
- 17/18 methods implemented (94%) ✅
- 4/4 partition functions ✅
- impl.go now 927 lines
- Code compiles successfully ✅

Next: Phase 4 - Integration (update 26 dependent files)

* docs: add CLAUDE.md to track AI contributions

Added comprehensive documentation of Events Repository SQLc Migration:
- Phases 1-3 complete (94% of implementation)
- 1,394 lines of code written across 3 files
- Critical fix: URI-based database connection for all packages
- Next: Phase 4 (Integration) and Phase 5 (Testing)

* docs: update migration status - Phase 4 already complete

Phase 4 Integration Discovery:
- Integration was completed in previous commits (38d5031, e2e4300, fb09e9b)
- 54 files now use events.New(logger, db)
- 0 files use old postgres.NewEventRepo()
- Legacy database/postgres/event.go has been removed
- All packages compile successfully

Updated Documentation:
- migration.md now reflects Phase 4 completion
- CLAUDE.md tracks all AI contributions
- Removed Co-Authored-By lines from commits

Status:
✅ Phase 1: Preparation (100%)
✅ Phase 2: Query Migration (100%)
✅ Phase 3: Implementation (94%)
✅ Phase 4: Integration (100%)
⏭️ Phase 5: Testing (0%)
⏭️ Phase 6: Cleanup & Merge (0%)

Next: Phase 5 - Create comprehensive test suite

* fix(sqlc): fix all SQL generation errors and type mismatches across 11 packages

- Convert all SQL queries from positional ($1, $2) to named (@param_name) parameters
- Fix sqlc URI mode bug: replace SELECT * FROM CTEs with explicit column lists
- Fix type mismatches in users, sources, organisations, and projects implementations
- Regenerate all sqlc code with proper field names and types
- Remove duplicate sqlc-events.yaml configuration file
- All tests passing (14/14 = 100%)

Packages fixed:
- organisation_members: 10 queries converted (27 parameters)
- organisations: 11 queries converted (30 parameters)
- organisation_invites: 5 queries converted (19 parameters)
- projects: 12 queries converted (60+ parameters)
- users, sources, organisations, projects: type conversions to pgtype
- 13 packages: fixed SELECT * from CTE pattern

* fix: add pgtype conversions for remaining 5 packages after main merge

- api_keys: fix type conversions for CreateAPIKey and all query methods
- batch_retries: add rowToBatchRetry converter, fix pgtype.Int4 conversions
- delivery_attempts: fix all 7 string fields in CreateDeliveryAttempt
- event_types: wrap all string parameters in pgtype.Text
- filters: convert queries to named parameters and fix all type conversions

All packages now build successfully and events tests pass (14/14)

* fix: convert positional to named params and fix type conversions

- Convert organisation_invites queries.sql: - → @id, @organisation_id, etc.
- Convert organisation_members queries.sql: - → @id, @user_id, etc.
- Fix type conversions in organisation_members/impl.go (pgtype.Text.String)
- Update organisation_invites/impl.go with proper field names
- Fix gofmt issues in batch_retries, filters, projects
- Remove unused functions: rowToBatchRetryOld, rowToEventTypeFilterOld

All packages now compile cleanly:
- go vet ./... ✅
- gofmt -s -l . ✅
- golangci-lint run ✅ (0 issues)

* fix: resolve enum scanning issues from parameter conversion

Add sqlc type overrides for custom PostgreSQL enums to fix scanning issues where delivery_mode and auth_type enums were being generated as interface{} instead of strings.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add required Raw field to event creation in test helpers

The events table has a NOT NULL constraint on the raw column. Test helpers were creating events without this field, causing all delivery_attempts tests to fail with constraint violations.

Fixes:

- internal/delivery_attempts: seedEventDelivery() helper

- internal/projects: seedEvent() helper

This resolves 17/20 delivery_attempts test failures.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use StringToPgTextFilter for empty orgID in LoadProjects

The FetchProjects SQL query uses: WHERE (p.organisation_id = @org_id OR @org_id = '')

When orgID is empty, StringToPgText('') returns {Valid: false} (NULL), which doesn't match the condition @org_id = '' in SQL (NULL \!= '').

Using StringToPgTextFilter instead keeps empty strings valid, allowing the query to return all projects when no orgID filter is specified.

This fixes:

- internal/projects LoadProjects returning 0 results with empty filter

- internal/pkg/loader failing to load subscriptions (was getting 0 projects)

All 9 loader tests and all projects tests now pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add COALESCE for nullable fields in E2E test queries

E2E test helpers were using raw SQL to query events without handling NULL values. Fields like source_id, idempotency_key, and url_query_params can be NULL in the database but were being scanned into non-nullable string fields, causing scanning errors.

Added COALESCE to convert NULL to empty string for nullable fields:

- source_id

- idempotency_key

- url_query_params

This fixes all 20 SQS E2E tests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add COALESCE for nullable fields in Kafka and AMQP E2E tests

Applied the same fix as SQS E2E tests - nullable fields need COALESCE to prevent scanning errors when converting NULL to string.

Fixed queries in:

- e2e/kafka/helpers_test.go

- e2e/amqp/helpers_test.go

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: add Raw field to SeedEvent test helper

The SeedEvent helper in api/testdb/seed.go was missing the required Raw field when creating events, causing replay event E2E tests to fail with NOT NULL constraint violations.

Fixed by adding: Raw: string(data)

This fixes:

- TestE2E_ReplayEvent_JobID_Format

- TestE2E_ReplayEvent_MultipleReplays

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor(tests): reorder imports in event and impl test files

Standardize import ordering in `event_pagination_baseline_test.go` and `impl_test.go` by grouping and reorganizing third-party, internal, and system packages.

* refactor: remove redundant helper functions and standardize type conversions

Removed unused helper functions (`pgTextToString`, `stringToPgText`, `pgTimestamptzToTime`), and updated all references to use `common` package helpers for pgtype conversions. Simplifies the codebase and ensures consistency.

* feat: implement ExportRecords with batch processing and JSON export

Added ExportRecords support in `impl.go` to process large datasets in batches, avoiding memory issues. Integrated repository methods `ExportEvents` and `CountExportedEvents` for efficient pagination and counting. Updates include:

- Processes batch sizes of 3000
- Outputs records as a JSON array directly to the provided writer
- Writes empty array if no records are found

Also, updated `queries.sql` and `querier.go` to define and use new SQL methods for exporting events and counting records.

* refactor: standardize type conversions by using StringToPgTextFilter

Replaced `StringToPgText` with `StringToPgTextFilter` across API key and event methods to handle empty strings consistently. This ensures proper conversion for nullable fields and prevents invalid SQL parameter issues.

* chore: remove events migration documentation and baseline tests

* refactor: remove unused device-related fields and helpers from subscription handling

Simplified subscription data mapping by removing `deviceID` and `deviceMetadata` related fields. Deleted support for `FetchSubscriptionByDeviceIDRow` as it is no longer used across the codebase.

* resolve merge conflict in `event.go` by retaining imports from `feature/events-sqlc-migration` branch

* refactor: refactor `FindEventsByIdempotencyKey` to return boolean

Updated `FindEventsByIdempotencyKey` to return a boolean instead of a slice of strings for better clarity and efficiency. Refactored all references and corresponding SQL queries to reflect this change.

* refactor(tests): remove unused `eventIDs` from `impl_test.go`

Cleaned up redundant `eventIDs` slice from test cases in `impl_test.go` as it was no longer being used. Simplifies the test implementation.

* refactor(tests): add `defaultSearchParams` utility

* refactor(events): improve filtering logic and remove unused SQL queries

Enhanced filtering logic in `CountPrevEventsExists` by adding support for `OwnerID` and combined endpoint/owner filters. Removed unused `CreateEventEndpoints` query and related methods. Cleaned up test cases for partitioning methods.

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 13, 2026

CLA assistant check
All committers have signed the CLA.

mekilis added 2 commits March 13, 2026 23:08
- Add Redis cache-aside for GetUsage (10min TTL, usage:{orgID}:{period})
- Use CalculateUsageCombined single query instead of 4 parallel queries
- Refresh cache async on hit; singleflight prevents duplicate concurrent populates
- Add TestFormatUsageResponse; revert-concurrently.sh for sqlc workflow
@mekilis mekilis force-pushed the feat/optimize-billing-usage branch from 8bfb66d to 835fc6e Compare March 13, 2026 22:08
mekilis added 2 commits March 13, 2026 23:37
Addresses review: background work outlives the request, so we use a fresh
context with timeout rather than r.Context() which would be cancelled
when the client disconnects. Matches updateBillingEmailIfEmpty pattern.

Made-with: Cursor
@mekilis
Copy link
Copy Markdown
Collaborator Author

mekilis commented Mar 17, 2026

Closing in favor of Overwatch + ClickHouse billing usage approach (PDE-668). Linking to PDE-581.

@mekilis mekilis closed this Mar 17, 2026
auto-merge was automatically disabled March 17, 2026 01:28

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants