Skip to content

Define a MetricsExporter interface#41

Open
JoshDreamland wants to merge 3 commits intoClickHouse:mainfrom
JoshDreamland:MetricsExporter
Open

Define a MetricsExporter interface#41
JoshDreamland wants to merge 3 commits intoClickHouse:mainfrom
JoshDreamland:MetricsExporter

Conversation

@JoshDreamland
Copy link

Extracts ClickHouse wire format (Block) assembly into a new interface.

This interface is compatible with OpenTelemetry-style logging and aggregation instrumentation.

This change is functionally a no-op. The ClickHouse output should remain identical. This change does not implement OpenTelemetry exporting; this will be done in a follow-up.

Critical Review Focus

Please scrutinize the classification of columns in src/export/stats_exporter.cc:205-271.

While ClickHouse treats all columns equally, OTel requires us to strictly distinguish between Dimensions, Measurements, and Logs to prevent memory explosions in the local agent. The columns are therefore categorized as follows:

  • Tag* (Dimensions):

    • Semantics: Attributes shared by all metrics in the row (e.g., db_name, user).
    • Risk: These form the "unique key" for aggregation. High cardinality here will cause OOMs. (e.g., DO NOT tag with query_id).
    • Impact: A row with 100 metrics and 10 tags generates 1,000 unique time-series data points.
  • Metric* (Measurements):

    • Semantics: Numeric values to be aggregated (sums, counts, histograms).
    • MetricFixedString: Special case. This counts occurrences of unique strings. Must be low cardinality (e.g., err_sqlstate, wait_event_type), or it will blow up memory.
  • Record* (Logs/Raw Data):

    • Semantics: High-cardinality data that is useful for debugging but distinct from metrics (e.g., query_text, error_message, query_id).
    • Behavior: These will be attached to OTel LogRecords but excluded from Metric aggregation to preserve memory.

This interface is compatible with OpenTelemetry-style logging and aggregation instrumentation.

Ideally, this change itself is a no-op.
Copilot AI review requested due to automatic review settings February 26, 2026 05:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the ClickHouse-specific export logic behind a new StatsExporter interface, aiming to separate column classification (tags/metrics/records) from the wire-format (ClickHouse Block) implementation while keeping ClickHouse output identical.

Changes:

  • Introduces a new StatsExporter interface for constructing/exporting batches with typed Tag/Metric/Record columns.
  • Refactors the existing ClickHouse exporter logic into a ClickHouseExporter implementation of StatsExporter.
  • Updates the bgworker to include/use the renamed statistics exporter header and adjusts build ignores for CMake artifacts.

Reviewed changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/worker/bgworker.cc Switches include to the new exporter header and updates timeout comment reference.
src/export/stats_exporter.h Renames/repurposes the public C API header as “statistics exporter”.
src/export/stats_exporter.cc Implements ClickHouseExporter + rewires batch export flow to use the interface and column classification.
src/export/exporter_interface.h Adds the new StatsExporter interface used by the refactor and future OTel-style exporters.
.gitignore Ignores in-source CMake artifacts (CMakeCache.txt, CMakeFiles/).
Comments suppressed due to low confidence (5)

src/export/stats_exporter.cc:472

  • PschExportBatch() returns exporter->NumExported() even when CommitBatch() fails. Because events are dequeued before the commit, this reports a successful export to the caller (and the bgworker loop) even though nothing was inserted, causing silent data loss and defeating the backoff logic. Return 0 (or otherwise signal failure) when CommitBatch() returns false, and only return NumExported() on success.
    src/export/stats_exporter.cc:132
  • In CommitBatch(), the "connection not established" path returns false without incrementing consecutive_failures or calling PschRecordExportFailure(). This means repeated connection failures won’t trigger exponential backoff (and won’t show up in failure stats) unless an exception is thrown. Update this path to record the failure and advance the retry state consistently with the exception path.
    src/export/stats_exporter.cc:218
  • Column classification looks inconsistent with how the data is used and with the stated OTel semantics: cmd_type is low-cardinality and used as a GROUP BY dimension in the ClickHouse schema, but it’s currently exported as a RecordString (log-only). Consider classifying cmd_type as a Tag/Dimension so it can participate in metric aggregation without blowing up cardinality.
    src/export/stats_exporter.cc:266
  • err_elevel is categorical (severity/level), but it’s currently modeled as MetricUInt8, which implies it will be treated like a numeric measurement/histogram in an OTel-style exporter. To avoid incorrect metric semantics, model this as a dimension (e.g., add a TagUInt8/TagString variant and export it as a tag) rather than a metric.
    src/export/stats_exporter.cc:271
  • app is currently exported as a RecordString, but the ClickHouse schema and docs use app as a primary filter / grouping dimension (e.g., GROUP BY app in MVs and example queries). If the intent is to preserve existing queryability when mapping to OTel metrics later, consider classifying app as a Tag/Dimension (or make this configurable) and keep only truly high-cardinality fields (like client_addr) as records.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@theory
Copy link
Contributor

theory commented Feb 26, 2026

This looks great.

Critical Review Focus

Please scrutinize the classification of columns in src/export/stats_exporter.cc:205-271.

At a glance these look sensible, but it might be easier to scan if the three types are grouped, rather than mixed. Easier to know where to put new stuff, too.

  • Metric* (Measurements):

    • Semantics: Numeric values to be aggregated (sums, counts, histograms).
    • MetricFixedString: Special case. This counts occurrences of unique strings. Must be low cardinality (e.g., err_sqlstate, wait_event_type), or it will blow up memory.

It looks like they're all simple measurements, no? They all appear to be named something like MetricInt64. I'd suggest including the type of metric in the name, e.g. MetricMeasureInt64, MetricSumInt64, MetricCountInt64, MetricHistogramInt64. But maybe I'm missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants