Skip to content

feat(git): persistent histogram of clone/fetch traffic per repository#292

Draft
worstell wants to merge 1 commit intomainfrom
feat-git-repo-clone-histogram
Draft

feat(git): persistent histogram of clone/fetch traffic per repository#292
worstell wants to merge 1 commit intomainfrom
feat-git-repo-clone-histogram

Conversation

@worstell
Copy link
Copy Markdown
Contributor

@worstell worstell commented May 5, 2026

Tracks per-repository pack-fetch counts in the metadata DB so callers can identify the most frequently cloned repos served by the proxy. Exposes a RepoCounts API; no HTTP surface.

  • Storage: metadatadb.IntMap[string] keyed by <upstream-url>|<YYYY-MM-DD>. Daily buckets keep windowed queries trivial; a daily reaper drops entries older than 90 days, short-circuits on an empty namespace, returns the deleted count, and only logs when something was actually pruned.
  • Counted events: protocol v1 POST /git-upload-pack and protocol v2 command=fetch. Excluded: GET /info/refs (every fetch's discovery probe, ls-remote, and the proxy's own staleness check) and v2 command=ls-refs (the v2 equivalent of info/refs). RequestCountsAsFetch buffers the body, decodes gzip when present, and replays it for downstream handlers.
  • Validation order: the increment runs after cloneManager.GetOrCreate accepts the upstream URL, so unauthenticated callers cannot bloat the keyspace with arbitrary URLs.
  • Wiring: new metadatadb.NamespaceProvider, git.New/Register accept it (nil-safe), config.Load gains a setMetadataStore callback, and cmd/cachewd holds the store in an atomic.Pointer so the provider closure can resolve the "git" namespace at strategy-construction time.

@worstell
Copy link
Copy Markdown
Contributor Author

worstell commented May 5, 2026

@codex review

@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from 4e73b62 to 1005703 Compare May 5, 2026 22:20
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 12e84a01ec

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread internal/strategy/git/git.go Outdated
Comment on lines +330 to +331
if r.Method == http.MethodPost && strings.HasSuffix(pathValue, "/git-upload-pack") {
s.repoCounts.IncrementClone(upstreamURL)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude non-fetch upload-pack POSTs from clone counters

Only checking POST /git-upload-pack overcounts traffic because protocol v2 uses the same endpoint for ls-refs discovery as well as fetch; this repo already models that distinction in SpoolKeyForRequest tests (command=ls-refs vs command=fetch). With the current condition, git ls-remote and the discovery phase of fetches are counted as clone/fetch events, so the new /admin/git/top-repos histogram is materially inaccurate for its stated purpose. Gate increments on the upload-pack command type (count fetch/v1 negotiation, skip ls-refs).

Useful? React with 👍 / 👎.

@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from 1005703 to a3d932b Compare May 5, 2026 22:41
Track per-repository pack-fetch counts in the metadata DB so callers can
identify the most frequently cloned repos served by the proxy.

The new RepoCounts type wraps a metadatadb.IntMap[string] keyed by
"<upstream-url>|<YYYY-MM-DD>". Each real fetch increments the bucket
for today (UTC). Daily bucketing makes time-windowed queries trivial
("top repos last 7 days") while a periodic reaper keeps the namespace
bounded by deleting entries older than 90 days.

Counted events:
- POST /git-upload-pack containing a protocol v1 payload, or v2
  command=fetch.
- Excluded: GET /info/refs (every fetch's discovery probe, ls-remote,
  and the proxy's own staleness check) and v2 command=ls-refs (the v2
  equivalent of info/refs). RequestCountsAsFetch buffers and replays
  the body; gzip Content-Encoding is decoded for inspection.

The increment runs after cloneManager.GetOrCreate has accepted the
upstream URL, so unauthenticated callers cannot bloat the keyspace
with arbitrary URLs.

The reaper is a no-op short-circuit when the namespace is empty,
returns the count of deleted entries, and only logs when something
was actually pruned.

Wiring:
- internal/metadatadb: new NamespaceProvider type for lazy resolution.
- internal/strategy/git: New/Register accept a NamespaceProvider; nil-safe.
- internal/config: Load takes a setMetadataStore callback so callers can
  obtain the constructed Store before strategies are built.
- cmd/cachewd: declares an atomic.Pointer[metadatadb.Store] populated by
  Load and read by the git strategy's namespace provider closure.

No external surface is added — the histogram is exposed through the
RepoCounts API for in-process consumers.
@worstell worstell force-pushed the feat-git-repo-clone-histogram branch from a3d932b to ef5f4ef Compare May 5, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant