Skip to content

git ai usage#1434

Open
clarete wants to merge 91 commits into
git-ai-project:mainfrom
clarete:lincoln/git-ai-usage
Open

git ai usage#1434
clarete wants to merge 91 commits into
git-ai-project:mainfrom
clarete:lincoln/git-ai-usage

Conversation

@clarete
Copy link
Copy Markdown

@clarete clarete commented May 24, 2026

git-ai usage — local AI usage statistics

This PR adds a new git-ai usage subcommand that aggregates locally recorded metric events into a human-readable (or JSON) summary of AI coding activity, without requiring a network connection or account.

usage

git-ai usage [--period 1d|3d|7d|30d] [--repo <substring>] [--json]

How it works

The telemetry worker already flushes metric events to a local SQLite DB (metric_events). This PR adds a second table, local_events, that stores a filtered copy of the three event types relevant to usage stats (Committed, Checkpoint, SessionEvent) as they are flushed. A 30-day retention window is enforced with a once-per-day prune pass.

git-ai usage reads directly from local_events — no DB writes, no network calls.

Global View

Screenshot 2026-05-24 at 7 32 05 PM

Repository View

Screenshot 2026-05-24 at 7 33 04 PM
Open in Devin Review

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 24, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ svarlamov
❌ clarete
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration[bot]

This comment was marked as resolved.

@clarete clarete force-pushed the lincoln/git-ai-usage branch 2 times, most recently from 84412f5 to 88ce909 Compare May 27, 2026 13:50
clarete and others added 26 commits May 27, 2026 09:55
Adds a new `git-ai activity` CLI command that shows aggregated stats
from locally persisted metric events (Committed, Checkpoint, SessionEvent).

- Schema migration 2→3: adds local_events table (never cleared, unlike
  the upload queue) with indexes on event_id and ts
- store_local_events() in telemetry_worker fires before upload attempts,
  persisting interesting events regardless of network status
- local_stats module aggregates events in memory: AI/human line splits,
  per-tool breakdowns, unique session counts, files touched
- git-ai activity [--period 7d|30d|all] [--json] (default: 30d)
- Excluded from daemon-required commands; reads local DB standalone

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tted

Commits with no AI lines were dropping their human_additions because the
early-return guard fired before the accumulation. Moved the accumulation
above the guard.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Displays AI acceptance rate (committed AI lines / checkpoint AI lines)
below the per-tool breakdown. Only shown when checkpoint history is
sufficient to produce a meaningful ratio (<= 100%).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t-ai activity

- Activity over time: bar chart bucketed by day (<=7d), week (30d), or month (all)
  with empty buckets filled in so the chart is always contiguous
- AI coding velocity: each bucket shows AI lines + commit count so output
  doubles as a velocity trend across the period
- Time of day: 24-slot sparkline (local time) showing when AI-assisted
  commits land, using Unicode block chars for density

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Weekly buckets now display "May 18 – May 24" instead of just "May 18"
so users can see their work falls within the bucket rather than thinking
it refers to a specific day.

Activity over time section moved below Sessions and above Time of day.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace Commits/Checkpoints/Sessions headers with a top-level AI vs Human
percentage bar followed by stacked AI and Human sections. Surfaces the
same data with clearer framing — no internal "checkpoint" terminology.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Preserve the model from tool_model_pairs instead of stripping it, so the
breakdown distinguishes sonnet/opus/etc. Trims the redundant tool prefix
from the model name (claude::claude-sonnet-4-6 -> "claude · sonnet-4-6").

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surface how much committed code was confidently attributed to AI or
known-human vs left "untracked". Computed as (ai + human) / total git
diff additions. Answers the "can I trust these numbers" question by
exposing the size of the unattributed holes in the data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the single summarized "Attributed" bar with per-bucket coverage
in the Activity over time chart, so the trend (improving/degrading) is
visible rather than one aggregate number. Buckets now accumulate diff
additions and attributed lines across all commits; AI-lines bar and
commit count remain AI-only for consistency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extract per-message token usage (input/output/cache read/cache write)
from SessionEvent transcript JSON, grouped by model, with an estimated
USD cost from a built-in Anthropic pricing table (labeled estimate).

Dedups by assistant message id keeping field-wise max, since the
incremental transcript reader re-emits the same message and streaming
partials report lower token counts than the final message.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codex transcripts use a different schema than Claude: cumulative
per-session totals on token_count events (payload.info.total_token_usage)
rather than per-message usage. Parse them via a session-keyed accumulator
that keeps the running max, mapping codex field semantics onto ours
(input_tokens includes cached, so non-cached input is the difference;
cached -> cache_read; no cache-creation concept). Model name is captured
from the separate payload.model event. Adds GPT-5 family pricing estimate.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For each session, check whether a commit landed within 4 hours of the
session's last observed event. Surfaces shipped / abandoned / yield%
inline with the Sessions count in the AI section.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Accumulates AI lines committed per weekday (Mon-Sun) from committed
event timestamps and renders a sparkline heatmap below the hourly
time-of-day chart.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Track the timestamp of each token-usage event so the cost can be split
into this-week vs last-week halves. Renders as a gray annotation under
Est. cost: 'This week ~$X.XX · Last week ~$Y.YY  ↑/↓ N% vs last week'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Compute cache_read / (cache_read + cache_creation) per model and display
it inline with the per-model token line, e.g. 'cache 97% hit'. A low
ratio means context is being discarded and re-warmed frequently.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Track checkpoint AI lines per tool alongside committed AI lines, then
compute acceptance rate (committed / checkpoint) per tool. Displayed
inline with the per-tool commit breakdown, e.g. 'claude · sonnet-4-6:
912  (claude 72% accept)'. The global acceptance rate is kept.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
clarete and others added 24 commits May 27, 2026 09:55
A user with checkpoint events but no commits or sessions in the window
was falsely getting the no-data error even though checkpoint stats would
render in the output. Now checks checkpoints.ai_lines_added and
checkpoints.human_lines_added before declaring the result empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pers

The DB lock-acquire + get pattern was duplicated verbatim in
compute_activity, compute_session_list, and compute_repo_summaries.
Extracted into two free functions so the lock-poison error string and
the acquire pattern live in one place.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Was defined identically inside both compute_activity and
compute_session_list. Now a single module-level const with a doc
comment explaining its purpose.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The two-call trim_start_matches chain appeared at four separate sites.
Now a single strip_protocol(&str) -> &str function handles it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The dashboard CTA line was written in both branches of the
repo_filter if/else. Move it out so it prints unconditionally,
with only the --repo tip gated on the filter being absent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The LocalEventRecord construction closure was copy-pasted across all
three query branches (NULL filter, LIKE filter, no filter). Now defined
once as map_row and shared by all three.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
model_tokens is now keyed by shorten_model() at insertion, so the
model string in the iteration variable is already shortened. The
shorten_model call at the by_model push site was a no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The codex→TokenAccum field mapping (subtract cached from input, zero
cache_creation) was inlined in both build_token_summary and
compute_session_list. Now lives once as an impl method with a doc
comment explaining the field semantics.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three sites in print_terminal independently computed filled/empty block
counts and formatted them. Now handled by ratio_bar(value, max, width);
bar(pct, width) becomes a thin wrapper over it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rings

bucket_key and fill_buckets both independently produced daily/weekly/
monthly label strings. A change to any format required two edits. Now
handled by bucket_label(date, granularity) called from both sites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Repositories block: replace block-bar + dot-separated stats with
  aligned tabular columns; singular/plural for commit/session labels
- Model breakdown lines now include "lines" unit after the count
- Acceptance rate shows a range (e.g. 56–81%) when multiple tools
  have valid data, falling back to the overall rate for single-tool
- WoW block: drop "↑ new this week" label when last week had no spend
  (redundant); fix "$-0.00" display by formatting near-zero as "$0"
- Add format_cost() helper: rounds to whole dollars for amounts ≥ $10,
  keeps cents below that threshold; applied to all cost display sites

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Pre-compute max widths for tool names, line counts, model names, token
counts, and costs so each column lines up across rows.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously each repo triggered a separate SQL scan via compute_activity,
giving O(n × repos) fetches. Now one fetch covers all repos and grouping
is done in memory via compute_activity_from_records.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove get_distinct_repo_urls, is_backfilled, mark_backfilled, and
  get_existing_commit_shas -- all had no callers
- Fix days_ago() u32 truncation: clamp before casting so values above
  u32::MAX do not silently wrap
- Inline store_local_events into flush_metrics chunk loop so events are
  walked once; interesting events are serialized once per flush instead
  of in a separate pre-pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds opportunistic pruning to insert_local_events: at most once per day
a DELETE removes rows with ts older than 30 days, keeping the table
bounded to roughly the window used by git-ai usage (max 60d period).
Last-prune timestamp is persisted in schema_metadata.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Events predating the repo_url column (migration 2->3) have no repo
identity and were mapping to repo_url="", inflating repos.len() and
incorrectly triggering the multi-repo display path for single-repo users.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…1d/3d)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Previously a failed DELETE left the timestamp unwritten, causing
prune_local_events_if_due to retry on every subsequent flush. Wrapping
both writes in a transaction makes them atomic: either both succeed or
neither does, so a failure advances the timestamp and avoids a retry loop.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
committed * 100 could overflow u32 (wraps in release, panics in debug)
for tools with >42.9M committed AI lines. Widening to u64 before
multiplication eliminates the risk.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SessionRecord, compute_session_list, extract_claude_user_text,
extract_codex_user_text, and normalize_title had no call sites.
Removing them eliminates ~300 lines and a diverging copy of the
token-accumulation and yield-classification logic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…usage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@clarete clarete force-pushed the lincoln/git-ai-usage branch from 88ce909 to 5870c0d Compare May 27, 2026 13:56
- Use u64 arithmetic for all percentage multiplications in usage.rs to
  prevent u32 overflow on large line counts (same fix pattern as the
  prior acceptance-rate fix)
- Fix spark_char overflow: value*8 now computed in u64 before casting
- Emit u32::MAX instead of silently dropping tools with no checkpoint
  events in the query window, so the ">100% (incomplete data)" signal
  is surfaced as intended
- Consolidate the two separate DB fetches in handle_usage into a single
  compute_all() call so overall stats and per-repo breakdown always
  reflect the same event snapshot

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@svarlamov svarlamov closed this May 29, 2026
@svarlamov svarlamov reopened this May 29, 2026
@svarlamov svarlamov force-pushed the lincoln/git-ai-usage branch from 5870c0d to 3a78249 Compare May 29, 2026 14:02
If two processes race on the schema migration, the second ALTER TABLE
attempt would fail with "duplicate column name". Catch that specific
error so both processes proceed normally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants