ClickHouse usage analytics: events/gauges tables with daily MV#3
ClickHouse usage analytics: events/gauges tables with daily MV#3lohanidamodar wants to merge 106 commits intomainfrom
Conversation
- Database adapter - ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter. - Introduced new Query class for building ClickHouse queries with fluent interface. - Added support for advanced query operations including find and count methods. - Enhanced error handling and SQL injection prevention mechanisms. - Created comprehensive usage guide for ClickHouse adapter. - Added unit tests for Query class to ensure functionality and robustness. - Maintained backward compatibility with existing methods while improving overall architecture.
…metric logging with deterministic IDs
…ed tags in ClickHouse and Database adapters
…pdate tests for new behavior
Add UsageQuery class extending Query with a custom groupByInterval method that enables time-bucketed aggregated queries. When present in the queries array, the ClickHouse adapter switches from raw row returns to aggregated results grouped by time bucket (SUM for events, argMax for gauges). Supported intervals: 1m, 5m, 15m, 1h, 1d, 1w, 1M. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Query::parse() uses static::isMethod() which allows UsageQuery
to extend the valid method list. Without this override, parsing
'groupByInterval("time","1h")' throws "Invalid query method".
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add array shape type to findAggregatedFromTable $parsed parameter so PHPStan recognizes typed keys (filters, params, orderBy) - Provide default for optional groupByInterval key access - Split compound type check for $interval to satisfy PHPStan string narrowing in exception message interpolation - Remove extra trailing blank line in UsageBase.php (PSR-12) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The Database adapter silently cast string tenant IDs to int via (int), which truncates non-numeric strings (e.g. UUIDs) to 0 — effectively disabling tenant isolation. Now throws InvalidArgumentException for non-numeric tenants so the mismatch is caught immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on, getTotal ambiguity - Buffer key now includes tag hash so events with same metric but different tags (e.g. different paths) stay as separate entries instead of silently discarding the second call's tags - Daily table queries (findDaily, sumDaily, sumDailyBatch) now validate attributes against the daily schema (metric, value, time, tenant) instead of the full event schema. Querying path/method/status on the daily table now throws immediately instead of causing a ClickHouse "No such column" runtime error - Changed (int) cast to (float) for agg_value in getTimeSeries to avoid truncating fractional gauge values or large event sums - getTotal() now throws when a metric exists in both event and gauge tables instead of silently adding incompatible aggregations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
| /** | ||
| * Enable or disable shared tables mode (multi-tenant with tenant column). | ||
| * | ||
| * @param bool $sharedTables | ||
| * @return self | ||
| */ | ||
| abstract public function setSharedTables(bool $sharedTables): self; |
There was a problem hiding this comment.
This seems specific to Database adapter, should we move it to Databases.php?
| * - path: API endpoint path (events only) | ||
| * - method: HTTP method (events only) | ||
| * - status: HTTP status code (events only) |
There was a problem hiding this comment.
We're going to need a separate HttpLog.php type anyways, do we need these properties?
PHPStan level max flagged getTimeSeries() annotations as int while the ClickHouse adapter emits floats via agg_value cast. Updates the abstract, both adapters, the Usage facade, and zeroFillTimeSeries to float. Also throws on json_encode failure in Usage::collect so the md5() input is guaranteed string instead of string|false. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // Also check the other type's attributes for cross-table queries | ||
| $otherType = $type === 'event' ? 'gauge' : 'event'; | ||
| foreach ($this->getAttributes($otherType) as $attribute) { | ||
| if ($attribute['$id'] === $attributeName) { | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| throw new Exception("Invalid attribute name: {$attributeName}"); |
There was a problem hiding this comment.
Cross-type attribute fallback passes invalid columns to gauge queries
validateAttributeName() falls back to checking the opposite type's attributes. When find($queries, null) (or count/sum with $type = null) is called with event-specific filters like Query::equal('path', '/v1/'), parseQueries forwards the same $queries to both tables. For the gauge call, validateAttributeName('path', 'gauge') passes because it finds path in the event schema, generating WHERE \path` = {param:String}against the gauges table — which has no such column — causing a ClickHouseUNKNOWN_IDENTIFIER` server error at runtime.
The cross-type fallback should be removed so that an attribute is only valid if it exists in the target table's own schema:
// Remove lines 1138-1144 (the "Also check the other type's attributes" block)
throw new Exception("Invalid attribute name: {$attributeName}");Any cross-table query with type-specific filters should require the caller to pass an explicit $type.
…ts-only sum - extractGroupByInterval: match by method string, not instanceof (parsed queries are base Query) - flush(): selectively clear buffer on per-batch success (retry preserved on failure) - collect(): use TYPE_EVENT constant instead of string literal - addBatch(): require explicit \$type param (no default) - sum(): events-only by default (summing gauges is meaningless) - sumDaily*: document as events-only (daily MV has only events) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Push the count cap down into the DB layer so callers that only need a capped total (e.g. rendering "5000+") can stop ClickHouse early instead of scanning the full filtered set. ClickHouse wraps the count in a LIMIT-bounded subquery; Database delegates to utopia-php/database's existing $max arg. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| // Also check the other type's attributes for cross-table queries | ||
| $otherType = $type === 'event' ? 'gauge' : 'event'; | ||
| foreach ($this->getAttributes($otherType) as $attribute) { | ||
| if ($attribute['$id'] === $attributeName) { | ||
| return true; | ||
| } | ||
| } |
There was a problem hiding this comment.
Cross-type attribute fallback sends invalid columns to gauge queries
validateAttributeName falls back to checking the opposite type's schema (lines 1139–1144). When find/count/sum is called with $type = null, parseQueries passes the same queries to both tables. A caller passing Query::equal('path', '/v1/') passes validation for the gauge table because path is found in the event schema, generating WHERE \path` = {param:String}against the gauges table — which has no such column — and causing a ClickHouseUNKNOWN_IDENTIFIER` server error at runtime.
Remove the fallback block so each attribute is only valid for its own table's schema:
| // Also check the other type's attributes for cross-table queries | |
| $otherType = $type === 'event' ? 'gauge' : 'event'; | |
| foreach ($this->getAttributes($otherType) as $attribute) { | |
| if ($attribute['$id'] === $attributeName) { | |
| return true; | |
| } | |
| } | |
| throw new Exception("Invalid attribute name: {$attributeName}"); | |
| } |
Adds keyset-pagination cursor support (cursorAfter / cursorBefore) to the ClickHouse adapter via parseQueries. Cursor values accept Metric/ArrayObject or plain associative arrays; an `id` tiebreaker is auto-appended to ORDER BY so pagination is deterministic on non-unique columns. cursorBefore flips direction at SQL build time and reverses results post-fetch. Rejects two unsafe combinations: cursor + groupByInterval (no stable identity on aggregated rows), and cursor + null type (paginating across events and gauges has no coherent ordering). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small follow-ups based on review feedback on the audit PR's twin implementation: - Drop the always-true `!empty($orderAttributes)` guard inside the cursor branch. resolveCursorOrder() always appends an `id` tiebreaker, so the guard is dead code and was misleading. - normalizeCursorRow now removes `$id` after copying it to `id`, so cursor state is no longer carrying both keys. - Throw an explicit Exception when a cursor value is null. The previous path silently routed null `time` cursors through formatDateTime(null) which returns the current timestamp — a misconfigured cursor would filter on `time < now()` and produce wrong pages instead of failing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| return $this->countFromTable($queries, Usage::TYPE_EVENT, $max) | ||
| + $this->countFromTable($queries, Usage::TYPE_GAUGE, $max); |
There was a problem hiding this comment.
count(null, $max) returns up to 2 × $max, violating the documented upper-bound contract
When $type is null, count() independently applies $max as a SQL LIMIT to each of the events and gauges tables, then adds the two results. With count([], null, 100) a caller receives up to 200 — not at most 100. The Adapter docblock is explicit: $max is an "upper bound for the count (inclusive)". Fix: query both tables without per-table limit and cap the sum at the end.
| } | ||
| $document['time'] = $parsedTime; | ||
| } elseif ($key === 'value') { | ||
| $document[$key] = $value !== null ? (int) $value : null; |
There was a problem hiding this comment.
parseAggregatedResults truncates aggregate values to int
(int) $value silently truncates ClickHouse-computed aggregates. getTimeSeriesFromTable correctly uses (float) at line 2095; parseAggregatedResults and parseResults (line 2924) are inconsistent and will silently lose data if the schema is ever widened to Float64.
Adds notEqual, notContains, notBetween, isNull, isNotNull, startsWith, endsWith — keeping the supported Query method set in line with the audit ClickHouse adapter. startsWith / endsWith use ClickHouse's built-in functions of the same name; isNull / isNotNull emit `IS NULL` / `IS NOT NULL` (no value binding); the rest follow the existing param-bound pattern. The shared parseQueries logic is now consistent across both adapters: - getParamType() centralises the column → ClickHouse-type mapping (time → DateTime64(3), value → Int64, default → String). Previously each case had an inline `if (\$attribute === 'time')` branch. - formatTypedValue() routes DateTime-typed values through formatDateTime and everything else through formatParamValue, so each case has one code path. - buildCursorWhere() uses the same dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| $orderByExpr = $this->sharedTables ? '(tenant, id)' : '(id)'; | ||
|
|
||
| $createTableSql = " | ||
| CREATE TABLE IF NOT EXISTS {$escapedDatabaseAndTable} ( | ||
| {$columnDefs}{$indexDefsStr} | ||
| ) | ||
| ENGINE = MergeTree() | ||
| ORDER BY {$orderByExpr} |
There was a problem hiding this comment.
MergeTree ORDER BY
(id) eliminates time-range query efficiency
Both the events and gauges tables are created with ORDER BY (id) (or (tenant, id)), where id is a randomly generated UUID. ClickHouse stores rows physically in this sort order and uses it as the primary key for granule skipping. Because UUIDs are random, all time-range queries like WHERE time BETWEEN X AND Y and WHERE metric = 'foo' AND time BETWEEN ... will scan every granule — effectively a full table scan — at any scale.
The daily table correctly uses ORDER BY (metric, time) (line 1060), enabling efficient per-metric range scans. The raw events and gauges tables need the same treatment. A typical ORDER BY for this schema would be (metric, toStartOfDay(time), id) or (tenant, metric, toStartOfDay(time)) for shared tables, so ClickHouse can skip entire months/days worth of granules for a given metric.
| foreach ($typeResult as $metricName => $metricData) { | ||
| if (!isset($output[$metricName])) { | ||
| continue; | ||
| } | ||
|
|
||
| $output[$metricName]['total'] += $metricData['total']; | ||
| $output[$metricName]['data'] = array_merge( | ||
| $output[$metricName]['data'], | ||
| $metricData['data'] | ||
| ); | ||
| } |
There was a problem hiding this comment.
getTimeSeries with $type=null produces incoherent merged time series
When $type is null, results from both tables are fetched — events with SUM(value) per bucket, gauges with argMax(value, time) per bucket — and then array_merge'd into the same data array. If a metric name exists in both tables, zeroFillTimeSeries then sums the two incompatible values in the same bucket ($existing[$key] = ($existing[$key] ?? 0) + $point['value']), producing a meaningless total that mixes a cumulative sum with a point-in-time snapshot.
getTotal() guards against this (throws when $eventTotal > 0 && $gaugeTotal > 0), but getTimeSeries has no equivalent guard. Consider applying the same check before merging, or throwing when both result sets are non-empty for the same metric.
| $events = $this->findFromTable($queries, Usage::TYPE_EVENT); | ||
| $gauges = $this->findFromTable($queries, Usage::TYPE_GAUGE); | ||
|
|
||
| return array_merge($events, $gauges); |
There was a problem hiding this comment.
find($queries, null) with a limit Query returns up to 2× the requested limit
When $type is null, findFromTable is called for both tables with the same parsed queries. The limit is extracted once by parseQueries and applied independently to each SQL query (line 1535: LIMIT {limit:UInt64}). A call to find([Query::limit(10)], null) can therefore return up to 20 results — 10 from events, 10 from gauges. This violates the documented semantics of Query::limit as an upper bound and mirrors the same off-by-factor-of-two bug already flagged for count() with $type=null.
The same issue applies to offset pagination: find([Query::offset(10), Query::limit(5)], null) skips the first 10 rows in each table independently, not the first 10 rows of the combined result set.
…e strictness - ClickHouse purge() now also deletes from the daily aggregated table when purging events. Materialized views are forward-only, so purges on the source table left stale daily rows behind. Daily delete is skipped if any query references an event-only column (path/method/etc). - ClickHouse getTotalBatch() now raises when a metric appears in both the event and gauge tables under $type=null, matching the existing safeguard in getTotal(). Mixing SUM (events) with argMax (gauges) silently produced meaningless totals. - Usage::setNamespace/setTenant/setSharedTables now flush the buffer before changing adapter context. Buffered metrics carry no context, so changing it pre-flush would write them under the new context. - Database adapter now stores a 'type' field per document and filters by it in find/count/purge/getTotal when $type is non-null. Previously the $type argument was accepted but ignored, returning rows of both kinds. - composer.json: add 'test' script. - .github/workflows: bump actions/checkout v3 -> v4. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| $sql = "SELECT {$selectColumns} FROM {$fromTable}{$whereData['clause']}{$orderClause}{$limitClause}{$offsetClause} FORMAT JSON"; | ||
|
|
||
| return $this->parseResults($this->query($sql, $whereData['params']), Usage::TYPE_EVENT); |
There was a problem hiding this comment.
findDaily() queries SummingMergeTree without SUM + GROUP BY, returning stale unmerged rows
findDaily() issues a plain SELECT metric, value, time FROM <daily_table> WHERE .... SummingMergeTree only sums rows with the same ORDER BY key (metric, time) during background merges. Between an INSERT and the next merge, multiple rows can exist for the same (metric, time) with separate values. Callers receive raw, unmerged rows instead of the correct daily aggregate — billing queries would undercount or overcount depending on timing.
The SummingMergeTree contract requires reading with SUM(value) GROUP BY (metric, time):
$sql = "SELECT metric, SUM(value) as value, time FROM {$fromTable}{$whereData['clause']} GROUP BY metric, time {$orderClause}{$limitClause}{$offsetClause} FORMAT JSON";sumDaily() and sumDailyBatch() already use the correct SUM + GROUP BY pattern; findDaily() is the only outlier.
| // Use custom ORDER BY if specified, otherwise default to bucket ASC | ||
| $orderClause = ' ORDER BY bucket ASC'; | ||
| if (!empty($parsed['orderBy'])) { | ||
| $orderClause = ' ORDER BY ' . implode(', ', $parsed['orderBy']); | ||
| } |
There was a problem hiding this comment.
User-supplied
Query::orderDesc('time') in aggregated mode references a non-existent column
findAggregatedFromTable() aliases the time bucket as bucket in the SELECT and GROUP BY, but the user-supplied ORDER BY clause is built from parseQueries() using the original attribute names (e.g., `time` DESC). If a caller passes both UsageQuery::groupByInterval('1h') and Query::orderDesc('time'), the generated SQL becomes:
SELECT metric, SUM(value) as value, toStartOfHour(time) as bucket
FROM …
GROUP BY metric, bucket
ORDER BY `time` DESC -- `time` is not a grouped/aggregated column hereClickHouse raises an error because time is neither in the GROUP BY list nor an aggregate function result after grouping. The fix is to remap time ORDER BY entries to bucket in the aggregated query.
Mirrors the validator pattern in utopia-php/database
(Validator/Query/Filter.php): contains/notContains/equal/etc. queries
must have at least one value; an empty values array is rejected up front
with `<Method> queries require at least one value.` instead of silently
producing a "no filter applied" WHERE clause.
Without the guard, `Query::contains('metric', [])` would skip the IN
clause entirely and return all rows — exactly the opposite of the
intended IN () semantics, which should match nothing.
Applies the same VALUE_REQUIRED_METHODS allow-list and pre-switch check
that the audit adapter uses, so both libraries reject the same set of
empty-value filter methods consistently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| } | ||
|
|
||
| foreach (array_chunk($documents, max(1, $batchSize)) as $chunk) { | ||
| foreach ($chunk as $doc) { | ||
| $this->db->createDocument($this->collection, $doc); | ||
| } | ||
| } | ||
| }); | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| /** | ||
| * Get time series data for metrics. | ||
| * | ||
| * Stub implementation for Database adapter. | ||
| * | ||
| * @param array<string> $metrics | ||
| * @param string $interval | ||
| * @param string $startDate | ||
| * @param string $endDate | ||
| * @param array<Query> $queries | ||
| * @param bool $zeroFill | ||
| * @param string|null $type | ||
| * @return array<string, array{total: float, data: array<array{value: float, date: string}>}> | ||
| */ | ||
| public function getTimeSeries(array $metrics, string $interval, string $startDate, string $endDate, array $queries = [], bool $zeroFill = true, ?string $type = null): array | ||
| { | ||
| // Stub: Database adapter time series not yet implemented | ||
| $output = []; | ||
| foreach ($metrics as $metric) { | ||
| $output[$metric] = ['total' => 0, 'data' => []]; | ||
| } | ||
| return $output; | ||
| } | ||
|
|
||
| /** | ||
| * Get total value for a single metric. |
There was a problem hiding this comment.
Negative-value constraint absent in Database adapter
ClickHouse::validateMetricData() (line 1293) throws when $value < 0, and Usage::collect() enforces the same rule. However, Database::addBatch() has no corresponding guard. Calling $adapter->addBatch([['metric' => 'x', 'value' => -1]], Usage::TYPE_EVENT) on a Database adapter succeeds silently, while the identical call on a ClickHouse adapter throws. Code that passes the test suite with the Database adapter can therefore fail in production with the ClickHouse adapter. The abstract Adapter::addBatch() docblock does not document the sign constraint, so callers have no indication of the difference.
… retry dedup, gauge order, cross-type validation, Database value check)
| if ($type === Usage::TYPE_EVENT || $type === null) { | ||
| $typesToQuery[] = Usage::TYPE_EVENT; | ||
| } | ||
| if ($type === Usage::TYPE_GAUGE || $type === null) { | ||
| $typesToQuery[] = Usage::TYPE_GAUGE; | ||
| } | ||
|
|
||
| foreach ($typesToQuery as $queryType) { | ||
| $typeResult = $this->getTimeSeriesFromTable($metrics, $interval, $startDate, $endDate, $queries, $queryType); |
There was a problem hiding this comment.
getTimeSeries($queries, null) throws when event-specific filters are present
getTimeSeries with $type = null iterates over both TYPE_EVENT and TYPE_GAUGE. For each type it calls getTimeSeriesFromTable, which calls parseQueries($queries, $queryType). When $queryType is TYPE_GAUGE, validateAttributeName rejects any event-only column (path, method, status, etc.) and throws "Invalid attribute name: path". A caller that passes Query::equal('path', '/v1/') with $type = null will always get an exception on the gauge pass, even though the filter is perfectly valid for the event table. The same pattern was already flagged for find()/count()/sum() but getTimeSeries() has the identical code path and is not guarded.
| $output[$metricName]['total'] += $metricData['total']; | ||
| $output[$metricName]['data'] = array_merge( | ||
| $output[$metricName]['data'], | ||
| $metricData['data'] | ||
| ); |
There was a problem hiding this comment.
getTimeSeries($type=null) accumulates incoherent gauge+event totals
When $type is null, both tables are queried: events with SUM(value) per bucket and gauges with argMax(value, time) per bucket. The results are merged into $output[$metricName]['total'] += $metricData['total'], and then zeroFillTimeSeries sums both contributions into the same bucket via $existing[$key] += $point['value']. If a metric name exists in both tables, the resulting total is the sum of a cumulative counter and a point-in-time snapshot — a meaningless number. getTotal() throws in the analogous situation; getTimeSeries() silently corrupts the result.
Summary
Complete rewrite of the usage analytics library with a two-table architecture optimized for both real-time analytics and billing.
Architecture
Key Changes
API
Write
Read
Billing (Daily MV)
Test Plan