Skip to content

[4/7] Telemetry Event Emission and Aggregation#327

Open
samikshya-db wants to merge 90 commits intotelemetry-3-client-managementfrom
telemetry-4-event-aggregation
Open

[4/7] Telemetry Event Emission and Aggregation#327
samikshya-db wants to merge 90 commits intotelemetry-3-client-managementfrom
telemetry-4-event-aggregation

Conversation

@samikshya-db
Copy link
Contributor

Part 4 of 7-part Telemetry Implementation Stack

This layer adds event emission and per-statement aggregation with smart batching.

Summary

Implements TelemetryEventEmitter for event-driven telemetry and MetricsAggregator for efficient per-statement aggregation with smart flushing.

Components

TelemetryEventEmitter (lib/telemetry/TelemetryEventEmitter.ts)

Event-driven architecture using Node.js EventEmitter:

  • Type-safe emission methods for each event type
  • Respects telemetryEnabled configuration flag
  • All exceptions swallowed and logged at debug level only
  • Zero performance impact when disabled (early return)

Event Types:

  • connection.open - Successful connection establishment
  • statement.start - Statement execution begins
  • statement.complete - Statement execution completes
  • cloudfetch.chunk - CloudFetch chunk downloaded
  • error - Exception occurred with terminal classification

MetricsAggregator (lib/telemetry/MetricsAggregator.ts)

Per-statement aggregation with smart batching:

Aggregation Strategy:

  • Connection events → emit immediately (no aggregation)
  • Statement events → buffer until completeStatement() called
  • Terminal errors → flush immediately (critical failures)
  • Retryable errors → buffer until statement complete (optimize batching)

Flush Triggers:

  • Batch size reached (default: 100 metrics)
  • Periodic timer fired (default: 5000ms)
  • Terminal exception occurred (immediate flush)
  • Manual flush() called

Memory Management:

  • Bounded buffers prevent memory leaks
  • Completed statements removed from memory
  • Periodic timer cleanup

Smart Batching Benefits

  • Reduces HTTP overhead: Fewer export calls
  • Optimizes bandwidth: Batch multiple metrics
  • Critical error priority: Terminal errors flushed immediately
  • Efficient aggregation: Per-statement grouping reduces data size

Testing

  • 31 unit tests for TelemetryEventEmitter (100% function coverage)
  • 32 unit tests for MetricsAggregator (94% line, 82% branch coverage)
  • Tests verify exception swallowing (CRITICAL requirement)
  • Tests verify debug-only logging (CRITICAL requirement)
  • Tests verify batch size and timer triggers
  • Tests verify terminal vs retryable error handling

Next Steps

This PR is followed by:

  • [5/7] Export: DatabricksTelemetryExporter
  • [6/7] Integration: Wire into DBSQLClient
  • [7/7] Testing & Documentation

Dependencies

Depends on:

@samikshya-db
Copy link
Contributor Author

The emission format confirms to the telemetry proto, marked this ready for review.

@samikshya-db samikshya-db force-pushed the telemetry-3-client-management branch from 87d1e85 to 32003e9 Compare January 29, 2026 20:21
samikshya-db and others added 16 commits January 29, 2026 20:21
This is part 2 of 7 in the telemetry implementation stack.

Components:
- CircuitBreaker: Per-host endpoint protection with state management
- FeatureFlagCache: Per-host feature flag caching with reference counting
- CircuitBreakerRegistry: Manages circuit breakers per host

Circuit Breaker:
- States: CLOSED (normal), OPEN (failing), HALF_OPEN (testing recovery)
- Default: 5 failures trigger OPEN, 60s timeout, 2 successes to CLOSE
- Per-host isolation prevents cascade failures
- All state transitions logged at debug level

Feature Flag Cache:
- Per-host caching with 15-minute TTL
- Reference counting for connection lifecycle management
- Automatic cache expiration and refetch
- Context removed when refCount reaches zero

Testing:
- 32 comprehensive unit tests for CircuitBreaker
- 29 comprehensive unit tests for FeatureFlagCache
- 100% function coverage, >80% line/branch coverage
- CircuitBreakerStub for testing other components

Dependencies:
- Builds on [1/7] Types and Exception Classifier
This is part 3 of 7 in the telemetry implementation stack.

Components:
- TelemetryClient: HTTP client for telemetry export per host
- TelemetryClientProvider: Manages per-host client lifecycle with reference counting

TelemetryClient:
- Placeholder HTTP client for telemetry export
- Per-host isolation for connection pooling
- Lifecycle management (open/close)
- Ready for future HTTP implementation

TelemetryClientProvider:
- Reference counting tracks connections per host
- Automatically creates clients on first connection
- Closes and removes clients when refCount reaches zero
- Thread-safe per-host management

Design Pattern:
- Follows JDBC driver pattern for resource management
- One client per host, shared across connections
- Efficient resource utilization
- Clean lifecycle management

Testing:
- 31 comprehensive unit tests for TelemetryClient
- 31 comprehensive unit tests for TelemetryClientProvider
- 100% function coverage, >80% line/branch coverage
- Tests verify reference counting and lifecycle

Dependencies:
- Builds on [1/7] Types and [2/7] Infrastructure
This is part 4 of 7 in the telemetry implementation stack.

Components:
- TelemetryEventEmitter: Event-based telemetry emission using Node.js EventEmitter
- MetricsAggregator: Per-statement aggregation with batch processing

TelemetryEventEmitter:
- Event-driven architecture using Node.js EventEmitter
- Type-safe event emission methods
- Respects telemetryEnabled configuration flag
- All exceptions swallowed and logged at debug level
- Zero impact when disabled

Event Types:
- connection.open: On successful connection
- statement.start: On statement execution
- statement.complete: On statement finish
- cloudfetch.chunk: On chunk download
- error: On exception with terminal classification

MetricsAggregator:
- Per-statement aggregation by statement_id
- Connection events emitted immediately (no aggregation)
- Statement events buffered until completeStatement() called
- Terminal exceptions flushed immediately
- Retryable exceptions buffered until statement complete
- Batch size (default 100) triggers flush
- Periodic timer (default 5s) triggers flush

Batching Strategy:
- Optimizes export efficiency
- Reduces HTTP overhead
- Smart flushing based on error criticality
- Memory efficient with bounded buffers

Testing:
- 31 comprehensive unit tests for TelemetryEventEmitter
- 32 comprehensive unit tests for MetricsAggregator
- 100% function coverage, >90% line/branch coverage
- Tests verify exception swallowing
- Tests verify debug-only logging

Dependencies:
- Builds on [1/7] Types, [2/7] Infrastructure, [3/7] Client Management
Implements getAuthHeaders() method for authenticated REST API requests:
- Added getAuthHeaders() to IClientContext interface
- Implemented in DBSQLClient using authProvider.authenticate()
- Updated FeatureFlagCache to fetch from connector-service API with auth
- Added driver version support for version-specific feature flags
- Replaced placeholder implementation with actual REST API calls

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change feature flag endpoint to use NODEJS client type
- Fix telemetry endpoints to /telemetry-ext and /telemetry-unauth
- Update payload to match proto with system_configuration
- Add shared buildUrl utility for protocol handling

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change payload structure to match JDBC: uploadTime, items, protoLogs
- protoLogs contains JSON-stringified TelemetryFrontendLog objects
- Remove workspace_id (JDBC doesn't populate it)
- Remove debug logs added during testing
- Fix import order in FeatureFlagCache
- Replace require() with import for driverVersion
- Fix variable shadowing
- Disable prefer-default-export for urlUtils
This is part 2 of 7 in the telemetry implementation stack.

Components:
- CircuitBreaker: Per-host endpoint protection with state management
- FeatureFlagCache: Per-host feature flag caching with reference counting
- CircuitBreakerRegistry: Manages circuit breakers per host

Circuit Breaker:
- States: CLOSED (normal), OPEN (failing), HALF_OPEN (testing recovery)
- Default: 5 failures trigger OPEN, 60s timeout, 2 successes to CLOSE
- Per-host isolation prevents cascade failures
- All state transitions logged at debug level

Feature Flag Cache:
- Per-host caching with 15-minute TTL
- Reference counting for connection lifecycle management
- Automatic cache expiration and refetch
- Context removed when refCount reaches zero

Testing:
- 32 comprehensive unit tests for CircuitBreaker
- 29 comprehensive unit tests for FeatureFlagCache
- 100% function coverage, >80% line/branch coverage
- CircuitBreakerStub for testing other components

Dependencies:
- Builds on [1/7] Types and Exception Classifier
This is part 3 of 7 in the telemetry implementation stack.

Components:
- TelemetryClient: HTTP client for telemetry export per host
- TelemetryClientProvider: Manages per-host client lifecycle with reference counting

TelemetryClient:
- Placeholder HTTP client for telemetry export
- Per-host isolation for connection pooling
- Lifecycle management (open/close)
- Ready for future HTTP implementation

TelemetryClientProvider:
- Reference counting tracks connections per host
- Automatically creates clients on first connection
- Closes and removes clients when refCount reaches zero
- Thread-safe per-host management

Design Pattern:
- Follows JDBC driver pattern for resource management
- One client per host, shared across connections
- Efficient resource utilization
- Clean lifecycle management

Testing:
- 31 comprehensive unit tests for TelemetryClient
- 31 comprehensive unit tests for TelemetryClientProvider
- 100% function coverage, >80% line/branch coverage
- Tests verify reference counting and lifecycle

Dependencies:
- Builds on [1/7] Types and [2/7] Infrastructure
This is part 4 of 7 in the telemetry implementation stack.

Components:
- TelemetryEventEmitter: Event-based telemetry emission using Node.js EventEmitter
- MetricsAggregator: Per-statement aggregation with batch processing

TelemetryEventEmitter:
- Event-driven architecture using Node.js EventEmitter
- Type-safe event emission methods
- Respects telemetryEnabled configuration flag
- All exceptions swallowed and logged at debug level
- Zero impact when disabled

Event Types:
- connection.open: On successful connection
- statement.start: On statement execution
- statement.complete: On statement finish
- cloudfetch.chunk: On chunk download
- error: On exception with terminal classification

MetricsAggregator:
- Per-statement aggregation by statement_id
- Connection events emitted immediately (no aggregation)
- Statement events buffered until completeStatement() called
- Terminal exceptions flushed immediately
- Retryable exceptions buffered until statement complete
- Batch size (default 100) triggers flush
- Periodic timer (default 5s) triggers flush

Batching Strategy:
- Optimizes export efficiency
- Reduces HTTP overhead
- Smart flushing based on error criticality
- Memory efficient with bounded buffers

Testing:
- 31 comprehensive unit tests for TelemetryEventEmitter
- 32 comprehensive unit tests for MetricsAggregator
- 100% function coverage, >90% line/branch coverage
- Tests verify exception swallowing
- Tests verify debug-only logging

Dependencies:
- Builds on [1/7] Types, [2/7] Infrastructure, [3/7] Client Management
This is part 5 of 7 in the telemetry implementation stack.

Components:
- DatabricksTelemetryExporter: HTTP export with retry logic and circuit breaker
- TelemetryExporterStub: Test stub for integration tests

DatabricksTelemetryExporter:
- Exports telemetry metrics to Databricks via HTTP POST
- Two endpoints: authenticated (/api/2.0/sql/telemetry-ext) and unauthenticated (/api/2.0/sql/telemetry-unauth)
- Integrates with CircuitBreaker for per-host endpoint protection
- Retry logic with exponential backoff and jitter
- Exception classification (terminal vs retryable)

Export Flow:
1. Check circuit breaker state (skip if OPEN)
2. Execute with circuit breaker protection
3. Retry on retryable errors with backoff
4. Circuit breaker tracks success/failure
5. All exceptions swallowed and logged at debug level

Retry Strategy:
- Max retries: 3 (default, configurable)
- Exponential backoff: 100ms * 2^attempt
- Jitter: Random 0-100ms to prevent thundering herd
- Terminal errors: No retry (401, 403, 404, 400)
- Retryable errors: Retry with backoff (429, 500, 502, 503, 504)

Circuit Breaker Integration:
- Success → Record success with circuit breaker
- Failure → Record failure with circuit breaker
- Circuit OPEN → Skip export, log at debug
- Automatic recovery via HALF_OPEN state

Critical Requirements:
- All exceptions swallowed (NEVER throws)
- All logging at LogLevel.debug ONLY
- No console logging
- Driver continues when telemetry fails

Testing:
- 24 comprehensive unit tests
- 96% statement coverage, 84% branch coverage
- Tests verify exception swallowing
- Tests verify retry logic
- Tests verify circuit breaker integration
- TelemetryExporterStub for integration tests

Dependencies:
- Builds on all previous layers [1/7] through [4/7]
Implements getAuthHeaders() method for authenticated REST API requests:
- Added getAuthHeaders() to IClientContext interface
- Implemented in DBSQLClient using authProvider.authenticate()
- Updated FeatureFlagCache to fetch from connector-service API with auth
- Added driver version support for version-specific feature flags
- Replaced placeholder implementation with actual REST API calls

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use getAuthHeaders() method for authenticated endpoint requests
- Remove TODO comments about missing authentication
- Add auth headers when telemetryAuthenticatedExport is true

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Use NODEJS client type instead of OSS_NODEJS for feature flags
- Use /telemetry-ext and /telemetry-unauth (not /api/2.0/sql/...)
- Update payload to match proto: system_configuration with snake_case
- Add URL utility to handle protocol correctly

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Change payload structure to match JDBC: uploadTime, items, protoLogs
- protoLogs contains JSON-stringified TelemetryFrontendLog objects
- Remove workspace_id (JDBC doesn't populate it)
- Remove debug logs added during testing
- Fix import order in FeatureFlagCache
- Replace require() with import for driverVersion
- Fix variable shadowing
- Disable prefer-default-export for urlUtils
@samikshya-db samikshya-db force-pushed the telemetry-4-event-aggregation branch from 31f847e to 29376a6 Compare January 29, 2026 20:21
This is part 2 of 7 in the telemetry implementation stack.

Components:
- CircuitBreaker: Per-host endpoint protection with state management
- FeatureFlagCache: Per-host feature flag caching with reference counting
- CircuitBreakerRegistry: Manages circuit breakers per host

Circuit Breaker:
- States: CLOSED (normal), OPEN (failing), HALF_OPEN (testing recovery)
- Default: 5 failures trigger OPEN, 60s timeout, 2 successes to CLOSE
- Per-host isolation prevents cascade failures
- All state transitions logged at debug level

Feature Flag Cache:
- Per-host caching with 15-minute TTL
- Reference counting for connection lifecycle management
- Automatic cache expiration and refetch
- Context removed when refCount reaches zero

Testing:
- 32 comprehensive unit tests for CircuitBreaker
- 29 comprehensive unit tests for FeatureFlagCache
- 100% function coverage, >80% line/branch coverage
- CircuitBreakerStub for testing other components

Dependencies:
- Builds on [1/7] Types and Exception Classifier
This is part 3 of 7 in the telemetry implementation stack.

Components:
- TelemetryClient: HTTP client for telemetry export per host
- TelemetryClientProvider: Manages per-host client lifecycle with reference counting

TelemetryClient:
- Placeholder HTTP client for telemetry export
- Per-host isolation for connection pooling
- Lifecycle management (open/close)
- Ready for future HTTP implementation

TelemetryClientProvider:
- Reference counting tracks connections per host
- Automatically creates clients on first connection
- Closes and removes clients when refCount reaches zero
- Thread-safe per-host management

Design Pattern:
- Follows JDBC driver pattern for resource management
- One client per host, shared across connections
- Efficient resource utilization
- Clean lifecycle management

Testing:
- 31 comprehensive unit tests for TelemetryClient
- 31 comprehensive unit tests for TelemetryClientProvider
- 100% function coverage, >80% line/branch coverage
- Tests verify reference counting and lifecycle

Dependencies:
- Builds on [1/7] Types and [2/7] Infrastructure
This is part 4 of 7 in the telemetry implementation stack.

Components:
- TelemetryEventEmitter: Event-based telemetry emission using Node.js EventEmitter
- MetricsAggregator: Per-statement aggregation with batch processing

TelemetryEventEmitter:
- Event-driven architecture using Node.js EventEmitter
- Type-safe event emission methods
- Respects telemetryEnabled configuration flag
- All exceptions swallowed and logged at debug level
- Zero impact when disabled

Event Types:
- connection.open: On successful connection
- statement.start: On statement execution
- statement.complete: On statement finish
- cloudfetch.chunk: On chunk download
- error: On exception with terminal classification

MetricsAggregator:
- Per-statement aggregation by statement_id
- Connection events emitted immediately (no aggregation)
- Statement events buffered until completeStatement() called
- Terminal exceptions flushed immediately
- Retryable exceptions buffered until statement complete
- Batch size (default 100) triggers flush
- Periodic timer (default 5s) triggers flush

Batching Strategy:
- Optimizes export efficiency
- Reduces HTTP overhead
- Smart flushing based on error criticality
- Memory efficient with bounded buffers

Testing:
- 31 comprehensive unit tests for TelemetryEventEmitter
- 32 comprehensive unit tests for MetricsAggregator
- 100% function coverage, >90% line/branch coverage
- Tests verify exception swallowing
- Tests verify debug-only logging

Dependencies:
- Builds on [1/7] Types, [2/7] Infrastructure, [3/7] Client Management
This is part 5 of 7 in the telemetry implementation stack.

Components:
- DatabricksTelemetryExporter: HTTP export with retry logic and circuit breaker
- TelemetryExporterStub: Test stub for integration tests

DatabricksTelemetryExporter:
- Exports telemetry metrics to Databricks via HTTP POST
- Two endpoints: authenticated (/api/2.0/sql/telemetry-ext) and unauthenticated (/api/2.0/sql/telemetry-unauth)
- Integrates with CircuitBreaker for per-host endpoint protection
- Retry logic with exponential backoff and jitter
- Exception classification (terminal vs retryable)

Export Flow:
1. Check circuit breaker state (skip if OPEN)
2. Execute with circuit breaker protection
3. Retry on retryable errors with backoff
4. Circuit breaker tracks success/failure
5. All exceptions swallowed and logged at debug level

Retry Strategy:
- Max retries: 3 (default, configurable)
- Exponential backoff: 100ms * 2^attempt
- Jitter: Random 0-100ms to prevent thundering herd
- Terminal errors: No retry (401, 403, 404, 400)
- Retryable errors: Retry with backoff (429, 500, 502, 503, 504)

Circuit Breaker Integration:
- Success → Record success with circuit breaker
- Failure → Record failure with circuit breaker
- Circuit OPEN → Skip export, log at debug
- Automatic recovery via HALF_OPEN state

Critical Requirements:
- All exceptions swallowed (NEVER throws)
- All logging at LogLevel.debug ONLY
- No console logging
- Driver continues when telemetry fails

Testing:
- 24 comprehensive unit tests
- 96% statement coverage, 84% branch coverage
- Tests verify exception swallowing
- Tests verify retry logic
- Tests verify circuit breaker integration
- TelemetryExporterStub for integration tests

Dependencies:
- Builds on all previous layers [1/7] through [4/7]
This is part 6 of 7 in the telemetry implementation stack.

Integration Points:
- DBSQLClient: Telemetry lifecycle management and configuration
- DBSQLOperation: Statement event emissions
- DBSQLSession: Session ID propagation
- CloudFetchResultHandler: Chunk download events
- IDBSQLClient: ConnectionOptions override support

DBSQLClient Integration:
- initializeTelemetry(): Initialize all telemetry components
- Feature flag check via FeatureFlagCache
- Create TelemetryClientProvider, EventEmitter, MetricsAggregator, Exporter
- Wire event listeners between emitter and aggregator
- Cleanup on close(): Flush metrics, release clients, release feature flag context
- Override support via ConnectionOptions.telemetryEnabled

Event Emission Points:
- connection.open: On successful openSession() with driver config
- statement.start: In DBSQLOperation constructor
- statement.complete: In DBSQLOperation.close()
- cloudfetch.chunk: In CloudFetchResultHandler.downloadLink()
- error: In DBSQLOperation.emitErrorEvent() with terminal classification

Session ID Propagation:
- DBSQLSession passes sessionId to DBSQLOperation constructor
- All events include sessionId for correlation
- Statement events include both sessionId and statementId

Error Handling:
- All telemetry code wrapped in try-catch
- All exceptions logged at LogLevel.debug ONLY
- Driver NEVER throws due to telemetry failures
- Zero impact on driver operations

Configuration Override:
- ConnectionOptions.telemetryEnabled overrides config
- Per-connection control for testing
- Respects feature flag when override not specified

Testing:
- Integration test suite: 11 comprehensive E2E tests
- Tests verify full telemetry flow: connection → statement → export
- Tests verify feature flag behavior
- Tests verify driver works when telemetry fails
- Tests verify no exceptions propagate

Dependencies:
- Builds on all previous layers [1/7] through [5/7]
- Completes the telemetry data flow pipeline
samikshya-db and others added 13 commits January 30, 2026 08:14
- Create lib/telemetry/telemetryTypeMappers.ts
- Move mapOperationTypeToTelemetryType (renamed from mapOperationTypeToProto)
- Move mapResultFormatToTelemetryType (renamed from mapResultFormatToProto)
- Keep all telemetry-specific mapping functions in one place

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- http_path: API endpoint path
- socket_timeout: Connection timeout in milliseconds
- enable_arrow: Whether Arrow format is enabled
- enable_direct_results: Whether direct results are enabled
- enable_metric_view_metadata: Whether metric view metadata is enabled
- Only populate fields that are present

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add section 14 detailing implemented and missing proto fields
- List all fields from OssSqlDriverTelemetryLog that are implemented
- Document which fields are not implemented and why
- Explain that missing fields require additional instrumentation

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… in all telemetry logs

- Cache driver config in MetricsAggregator when connection event is processed
- Include cached driver config in all statement and error metrics
- Export system_configuration, driver_connection_params, and auth_type for every log
- Each telemetry log is now self-contained with full context

This ensures every telemetry event (connection, statement, error) includes
the driver configuration context, making logs independently analyzable.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implement CONNECTION_CLOSE telemetry event to track session lifecycle:
- Add CONNECTION_CLOSE event type to TelemetryEventType enum
- Add emitConnectionClose() method to TelemetryEventEmitter
- Add processConnectionCloseEvent() handler in MetricsAggregator
- Track session open time in DBSQLSession and emit close event with latency
- Remove unused TOperationType import from DBSQLOperation

This provides complete session telemetry: connection open, statement execution,
and connection close with latencies for each operation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Update test files to match new telemetry interface changes:
- Add latencyMs parameter to all emitConnectionOpen() test calls
- Add missing DriverConfiguration fields in test mocks (osArch,
  runtimeVendor, localeName, charSetEncoding, authType, processName)

This fixes TypeScript compilation errors introduced by the connection
close telemetry implementation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix missing event listener for CONNECTION_CLOSE events in DBSQLClient
telemetry initialization. Without this listener, connection close events
were being emitted but not routed to the aggregator for processing.

Now all 3 telemetry events are properly exported:
- CONNECTION_OPEN (connection latency)
- STATEMENT_COMPLETE (execution latency)
- CONNECTION_CLOSE (session duration)

Verified with e2e test showing 3 successful telemetry exports.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove verbose telemetry logs to minimize noise in customer logs.
Only log essential startup/shutdown messages and errors:

Kept (LogLevel.debug):
- "Telemetry: enabled" - on successful initialization
- "Telemetry: disabled" - when feature flag disables it
- "Telemetry: closed" - on graceful shutdown
- Error messages only when failures occur

Removed:
- Individual metric flushing logs
- Export operation logs ("Exporting N metrics")
- Success confirmations ("Successfully exported")
- Client lifecycle logs (creation, ref counting)
- All intermediate operational logs

Updated spec/telemetry-design.md to document the silent logging policy.

Telemetry still functions correctly - exports happen silently in the
background without cluttering customer logs.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Fix issue where statement_type was null in telemetry payloads.

Changes:
- mapOperationTypeToTelemetryType() now always returns a string,
  defaulting to 'TYPE_UNSPECIFIED' when operationType is undefined
- statement_type always included in sql_operation telemetry log

This ensures that even if the Thrift operationHandle doesn't have
operationType set, the telemetry will include 'TYPE_UNSPECIFIED'
instead of null.

Root cause: operationHandle.operationType from Thrift response can
be undefined, resulting in null statement_type in telemetry logs.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Connection metrics now include operation type in sql_operation:
- CREATE_SESSION for connection open events
- DELETE_SESSION for connection close events

This matches the proto Operation.Type enum which includes session-level
operations in addition to statement-level operations.

Before:
  sql_operation: null

After:
  sql_operation: {
    statement_type: "CREATE_SESSION"  // or "DELETE_SESSION"
  }

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Correct issue where Operation.Type values were incorrectly placed in
statement_type field. Per proto definition:

- statement_type expects Statement.Type (QUERY, SQL, UPDATE, METADATA, VOLUME)
- operation_type goes in operation_detail.operation_type and uses Operation.Type

Changes:
- Connection metrics: Set sql_operation.operation_detail.operation_type to
  CREATE_SESSION or DELETE_SESSION
- Statement metrics: Set both statement_type (QUERY or METADATA based on
  operation) and operation_detail.operation_type (EXECUTE_STATEMENT, etc.)
- Added mapOperationToStatementType() to convert Operation.Type to Statement.Type

This ensures telemetry payloads match the OssSqlDriverTelemetryLog proto
structure correctly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added operation_detail field to DatabricksTelemetryLog interface
- Enhanced telemetry-local.test.ts to capture and display actual payloads
- Verified all three telemetry events (CONNECTION_OPEN, STATEMENT_COMPLETE, CONNECTION_CLOSE)
- Confirmed statement_type and operation_detail.operation_type are properly populated

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added test for invalid query execution (TABLE_OR_VIEW_NOT_FOUND)
- Confirms SQL execution errors are handled as failed statements
- Verified telemetry payloads still correctly formatted during errors
- Note: Driver-level errors (connection/timeout) would need emitErrorEvent wiring

Test output shows correct behavior:
- CONNECTION_OPEN with CREATE_SESSION
- STATEMENT_COMPLETE with QUERY + EXECUTE_STATEMENT (even on error)
- CONNECTION_CLOSE with DELETE_SESSION

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
samikshya-db and others added 16 commits February 5, 2026 11:35
Three fixes addressing review feedback:

1. Fix documentation typo (sreekanth-db comment)
   - DatabricksTelemetryExporter.ts:94
   - Changed "TelemetryFrontendLog" to "DatabricksTelemetryLog"

2. Add proxy support (jadewang-db comment)
   - DatabricksTelemetryExporter.ts:exportInternal()
   - Get HTTP agent from connection provider
   - Pass agent to fetch for proxy support
   - Follows same pattern as CloudFetchResultHandler and DBSQLSession
   - Supports http/https/socks proxies with authentication

3. Fix flush timer to prevent rate limiting (sreekanth-db comment)
   - MetricsAggregator.ts:flush()
   - Reset timer after manual flushes (batch size, terminal errors)
   - Ensures consistent 30s spacing between exports
   - Prevents rapid successive flushes (e.g., batch at 25s, timer at 30s)

All changes follow existing driver patterns and maintain backward compatibility.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Feature flag fetching was also missing proxy support like telemetry
exporter was. Applied the same fix:

- Get HTTP agent from connection provider
- Pass agent to fetch call for proxy support
- Follows same pattern as CloudFetchResultHandler and DBSQLSession
- Supports http/https/socks proxies with authentication

This completes proxy support for all HTTP operations in the telemetry
system (both telemetry export and feature flag fetching).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactored FeatureFlagCache to support querying any feature flag,
not just the telemetry flag:

**Changes:**
- Store all flags from server in Map<string, string>
- Add generic isFeatureEnabled(host, flagName) method
- Keep isTelemetryEnabled() as convenience method
- fetchFeatureFlags() now stores all flags for future use

**Benefits:**
- Extensible to any safe feature flag
- No code changes needed to add new flags
- Single fetch stores all flags from response
- Backward compatible (isTelemetryEnabled still works)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Feature flags now use the same circuit breaker protection as telemetry
for resilience against endpoint failures.

**Changes:**
- FeatureFlagCache now accepts optional CircuitBreakerRegistry
- Feature flag fetches wrapped in circuit breaker execution
- Shared circuit breaker registry between feature flags and telemetry
- Per-host circuit breaker isolation maintained
- Falls back to cached values when circuit is OPEN

**Benefits:**
- Protects against repeated failures to feature flag endpoint
- Fails fast when endpoint is down (circuit OPEN)
- Auto-recovery after timeout (60s default)
- Same resilience patterns as telemetry export

**Configuration:**
- Failure threshold: 5 consecutive failures
- Timeout: 60 seconds
- Per-host isolation (failures on one host don't affect others)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ad format

- Update FeatureFlagCache tests to use new extensible flags Map
- Fix DatabricksTelemetryExporter tests to use protoLogs format
- Verify telemetry endpoints use correct paths (/telemetry-ext, /telemetry-unauth)
- 213 passing, 13 logging assertion tests need investigation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant