Telemetry client by carole-lavillonniere · Pull Request #58 · localstack/lstk

carole-lavillonniere · 2026-03-02T14:19:10Z

Adds a telemetry client that fires a single cli_cmd event when lstk start runs (this is merely to not have deadcode. Defining telemetry is still WIP PRO-210)

Enriches every event with version, os, arch, is_ci, machine_id (persistent uuid stored in the lstk config dir), a per-session UUID (ultimately one session ID per command), a X-Client header with value lstk/v2
Can be disabled when LOCALSTACK_DISABLE_EVENTS=1
Sent in a background goroutine with a 3s timeout so a slow/unreachable endpoint never blocks
Flushes on exit so that the event is sent even when the command returns early

✔️ Tested

Ran analytics-backend locally and observed:
{"timestamp": "2026-03-03T12:08:38.280", "level": "INFO", "threadName": "Dummy-8", "process": 1312488, "name": "analytics.core", "message": "record_events: Event(name='cli_cmd', metadata=EventMetadata(session_id='6ec424b8-0196-4815-83b7-2ae45e2d6146', client_time='2026-03-03 11:08:28.019701'), payload={'arch': 'amd64', 'cmd': 'lstk start', 'is_ci': False, 'machine_id': '5df95f6a-6dc0-4449-9ea8-f0f1b23b8f7c', 'os': 'linux', 'params': [], 'version': 'dev'})"}

carole-lavillonniere · 2026-03-03T12:27:18Z

cmd/root.go


 func runStart(ctx context.Context, rt runtime.Runtime) error {
+	tel := telemetry.New()
+	defer tel.Flush()


TODO: create one telemetry client per run/command.
For this we need to re-work our dependency injection (coming in a subsequent PR).
Then we can easily flush after a command finishes, in PersistentPostRun.
However we need to investigate if/how to still send the event when calling os.Exit(1) (which we do on error)
since it kills the process before any post-run hooks fire.

Once we have this, we would also have a single sessionID per command which all events for that command would send.

coderabbitai · 2026-03-03T13:24:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37daf75 and 4bf0c52.

📒 Files selected for processing (1)

internal/telemetry/client_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/telemetry/client_test.go

📝 Walkthrough

Walkthrough

Adds a telemetry system: new telemetry client (async Track/Flush), machine ID persistence, Env fields for analytics endpoint and disable flag, README env docs, instrumentation in CLI start, and unit/integration tests for telemetry behavior.

Changes

Cohort / File(s)	Summary
Telemetry implementation `internal/telemetry/client.go`, `internal/telemetry/machine_id.go`, `internal/telemetry/client_test.go`	New telemetry Client type with `New`, `Track`, and `Flush`; async POSTing of enriched events to a configurable endpoint; machine ID load-or-create persisted to config dir; unit tests for request shape and headers.
CLI instrumentation & config `cmd/root.go`, `internal/env/env.go`, `README.md`	Initialize telemetry in `runStart` and emit a `cli_cmd` event; extend `Env` with `AnalyticsEndpoint` and `DisableEvents`; document `LOCALSTACK_AUTH_TOKEN` and `LOCALSTACK_DISABLE_EVENTS` in README.
Integration tests & env keys `test/integration/telemetry_test.go`, `test/integration/env/env.go`	New integration tests covering successful send, unreachable endpoint, and disabled telemetry; added env constants `LSTK_ANALYTICS_ENDPOINT` and `LOCALSTACK_DISABLE_EVENTS`.
Dependencies / mods `go.mod`, `test/integration/go.mod`	Added/updated module requires including `github.com/google/uuid` (and bubbles moved to direct dependency in main go.mod).

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI (runStart)
    participant TEL as Telemetry Client
    participant FS as File System
    participant NET as Analytics Server

    CLI->>TEL: New()
    TEL->>FS: LoadOrCreateMachineID()
    FS-->>TEL: machine_id
    CLI->>TEL: Track("cli_cmd", payload)
    TEL->>TEL: Enrich payload (version, OS/arch, CI, session_id, machine_id)
    TEL->>NET: POST /analytics (async)
    NET-->>TEL: Response / timeout
    CLI->>TEL: Flush()
    TEL-->>CLI: All events flushed or timed out

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Centralize env vars with Viper #39: Modifies internal/env and cmd/root.go; likely the foundational env changes this PR extends for analytics fields.

Suggested reviewers

silv-io

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Telemetry client' directly describes the main change—adding a new telemetry client to the codebase for tracking CLI events.
Description check	✅ Passed	The description clearly explains the telemetry client feature, including event enrichment, disabling mechanism, async behavior with timeout, and flushing on exit.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch carole/drg-534

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

README.md (1)

21-24: Normalize env var naming in the table and document endpoint override.

Use only variable names in the first column (Line 24), and add LSTK_ANALYTICS_ENDPOINT to reflect the new config surface.

Proposed README tweak

 | Variable | Description |
 |---|---|
 | `LOCALSTACK_AUTH_TOKEN` | Auth token; for CI only |
-| `LOCALSTACK_DISABLE_EVENTS=1` | Disables telemetry event reporting |
+| `LOCALSTACK_DISABLE_EVENTS` | Set to `1` to disable telemetry event reporting |
+| `LSTK_ANALYTICS_ENDPOINT` | Override telemetry ingestion endpoint (advanced/testing) |

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 21 - 24, Update the environment variable table so the
first column contains only variable names (e.g., replace
`LOCALSTACK_DISABLE_EVENTS=1` with `LOCALSTACK_DISABLE_EVENTS`), keep
`LOCALSTACK_AUTH_TOKEN` described as "Auth token; for CI only", and add a new
row for `LSTK_ANALYTICS_ENDPOINT` with a short description like "Overrides
analytics endpoint" to document the endpoint override surface.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/telemetry/client_test.go`:
- Around line 41-43: The test mutates the global env.Vars (setting
env.Vars.DisableEvents and env.Vars.AnalyticsEndpoint) without restoring it,
causing config leakage; capture the original env.Vars value at the start of the
test (e.g. orig := env.Vars) and defer restoring it (defer func() { env.Vars =
orig }()), then proceed to set env.Vars.DisableEvents and
env.Vars.AnalyticsEndpoint for the test and allow the deferred restore to reset
global state after the test finishes.
- Line 48: The test currently does a blocking receive "got := <-ch" which can
hang if telemetry isn't sent; change the receive in the test (in the test
function using variable ch in internal/telemetry/client_test.go) to use a select
with a timeout (e.g., time.After) so the test fails fast on timeout and reports
an error instead of hanging; ensure you import time and call t.Fatalf or t.Error
when the timeout case fires to make the failure explicit.
- Around line 26-34: The httptest server handler (the anonymous http.HandlerFunc
created for srv := httptest.NewServer(...)) currently calls require.* (e.g.,
require.NoError and require.Len) inside the handler goroutine which calls
FailNow from a non-test goroutine; instead capture errors and validation results
in local variables (or send them on a channel) within the handler (e.g., store
body read error, json.Unmarshal error, and len of req.Events into variables or
an error value), close the request body, and return normally; then, in the test
goroutine after making the request (and after srv.Close()), assert those
captured values using require.* (referencing the same variables used in the
handler such as bodyErr, unmarshalErr, eventsLen or a returned handlerErr) so
all require.* calls execute in the test goroutine.

In `@internal/telemetry/client.go`:
- Around line 103-107: The code calls c.httpClient.Do(req) and then discards the
error from resp.Body.Close() with "_ = resp.Body.Close()"; change this to check
and handle the Close error instead of ignoring it—e.g. capture closeErr :=
resp.Body.Close(); if closeErr != nil { return closeErr } or log the error using
the client's logger depending on the surrounding function's error flow—update
the handling in the function containing c.httpClient.Do(req) so
resp.Body.Close() is not silently ignored.

---

Nitpick comments:
In `@README.md`:
- Around line 21-24: Update the environment variable table so the first column
contains only variable names (e.g., replace `LOCALSTACK_DISABLE_EVENTS=1` with
`LOCALSTACK_DISABLE_EVENTS`), keep `LOCALSTACK_AUTH_TOKEN` described as "Auth
token; for CI only", and add a new row for `LSTK_ANALYTICS_ENDPOINT` with a
short description like "Overrides analytics endpoint" to document the endpoint
override surface.

ℹ️ Review info

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3ac17a7 and 37daf75.

⛔ Files ignored due to path filters (1)

go.sum is excluded by !**/*.sum

📒 Files selected for processing (10)

README.md
cmd/root.go
go.mod
internal/env/env.go
internal/telemetry/client.go
internal/telemetry/client_test.go
internal/telemetry/machine_id.go
test/integration/env/env.go
test/integration/go.mod
test/integration/telemetry_test.go

internal/telemetry/client_test.go

internal/telemetry/client.go

thrau

Thanks for taking a first stab at this @carole-lavillonniere! Looks like we got the event schema down, so from an analytics-backend side this should work no problem (thanks for validating!).

It makes sense that we're filling the sessionId with a newly generated UUID. As we've spoken about, this is not used for anything else but localstack sessions at the moment, but at least we have clear semantics for the CLI for now (one command = new session id). I don't have much to add there.

My main ask is around the machine ID, but I think it would be great if we could rework this using a producer/consumer pattern for reasons I explain inline. Happy to merge this as a first iteration though so we can get going.

thrau · 2026-03-03T22:02:02Z

internal/telemetry/client.go

+			return
+		}
+		req.Header.Set("Content-Type", "application/json")
+		req.Header.Set("X-Client", clientHeader)


not sure what we would use this for at the moment. i like the idea of setting the user agent header though.

for example, we could build a typical user agent header:

localstack lstk/<version> <os>/<kernel>/<arch> ^^^^^^^^^^ ^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^ namespace software system info

and send that. this looks future proof, standard conform, and i can see that being super useful in the analytics backend at some point :)
i would love to introduce this in the emulator also

thrau · 2026-03-03T22:09:00Z

internal/telemetry/client.go

+	SessionID  string `json:"session_id"`
+}
+
+func (c *Client) Track(name string, payload map[string]any) {


nit on naming: i would prefer something like Log or Emit as method name here. to me "Track" implies some sort of long-living process, a lifecycle, or a correlation of some sort. whereas this is really just a fire-and-forget logging method, and Log or Emit matches that mental model more.

thrau · 2026-03-03T22:13:36Z

internal/telemetry/client.go

+	SessionID  string `json:"session_id"`
+}
+
+func (c *Client) Track(name string, payload map[string]any) {


i think we should pass a context object here, and hand it down to the http request, so we have lifecycle control. though not necessarily needed if we go with a producer/consumer pattern.

thrau · 2026-03-03T22:27:44Z

internal/telemetry/client.go

+	}
+
+	c.wg.Add(1)
+	go func() {


i understand that we don't want Track to block to avoid callers to manage concurrency, but i think there are better ways to solve this than creating a new go routine for each call and tracking their invocations using a WaitGroup.

i see a few drawbacks here:

there's unbounded go routine creation. if we use this client to emit events in a loop, we may end up creating lots of go routines without any backpressure. nothing is slowing requests down, so this may end up creating resource spikes if used incorrectly.

if memory serves, calling WaitGroup.Add with WaitGroup.Wait concurrently is not safe

there's no cancellation or lifecycle control

my preference would be creating a producer/worker pattern:

use a (optionally bounded) channel of events

use a single go routine that reads from the channel and sends the data in a loop

Close or Shutdown that stops acceptance of new events and drains the channel

now, Track becomes just a write into the channel. it gives us controlled concurrency, predictable resource usage, simpler shutdown mechanics (👀 Flush), option to create drop policies when queue is full, better encapsulation between concerns (marshalling data vs loading data)

thrau · 2026-03-03T22:31:55Z

internal/telemetry/machine_id.go

+		return ""
+	}
+
+	id := uuid.NewString()


it would be great if we could attempt to use a salt hashed version of the machine id, or docker daemon id if available, so we have a predictable machine id. here's a reference in localstack: https://github.com/localstack/localstack/blob/ada377c2c63ac89f968cb31627a0d1a1dabb8284/localstack-core/localstack/utils/analytics/metadata.py#L172-L200

if we get it in the same way, we could ideally link the generated machine IDs the ones generated in localstack.

thrau · 2026-03-03T22:34:19Z

internal/telemetry/client.go

+// before process exit to avoid dropping telemetry events. It returns quickly
+// when no events are pending, and is bounded by the HTTP client's timeout in
+// the worst case.
+func (c *Client) Flush() {


nit on naming: as a caller, i would expect Flush to force buffered items to be emptied to the underlying IO writer. this method looks more like a WaitForShutdown? though we could avoid this if we go with the producer/consumer pattern.

thrau · 2026-03-03T22:38:37Z

cc @vittoriopolverino in case you are interested

carole-lavillonniere force-pushed the carole/drg-534 branch 4 times, most recently from a5da6dd to 16cc7d8 Compare March 3, 2026 11:10

telemetry client

37daf75

carole-lavillonniere force-pushed the carole/drg-534 branch from 16cc7d8 to 37daf75 Compare March 3, 2026 12:12

carole-lavillonniere commented Mar 3, 2026

View reviewed changes

carole-lavillonniere marked this pull request as ready for review March 3, 2026 13:19

carole-lavillonniere requested a review from silv-io as a code owner March 3, 2026 13:19

carole-lavillonniere requested review from anisaoshafi and thrau March 3, 2026 13:19

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

internal/telemetry/client_test.go Outdated Show resolved Hide resolved

internal/telemetry/client_test.go Show resolved Hide resolved

internal/telemetry/client_test.go Outdated Show resolved Hide resolved

internal/telemetry/client.go Show resolved Hide resolved

address review comments

4bf0c52

thrau requested changes Mar 3, 2026

View reviewed changes

Conversation

carole-lavillonniere commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Tested

Uh oh!

carole-lavillonniere Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thrau left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

thrau commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

carole-lavillonniere commented Mar 2, 2026 •

edited

Loading

carole-lavillonniere Mar 3, 2026 •

edited

Loading

coderabbitai bot commented Mar 3, 2026 •

edited

Loading

thrau left a comment •

edited

Loading