feat(git): persistent histogram of clone/fetch traffic per repository#292
feat(git): persistent histogram of clone/fetch traffic per repository#292
Conversation
|
@codex review |
4e73b62 to
1005703
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 12e84a01ec
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if r.Method == http.MethodPost && strings.HasSuffix(pathValue, "/git-upload-pack") { | ||
| s.repoCounts.IncrementClone(upstreamURL) |
There was a problem hiding this comment.
Exclude non-fetch upload-pack POSTs from clone counters
Only checking POST /git-upload-pack overcounts traffic because protocol v2 uses the same endpoint for ls-refs discovery as well as fetch; this repo already models that distinction in SpoolKeyForRequest tests (command=ls-refs vs command=fetch). With the current condition, git ls-remote and the discovery phase of fetches are counted as clone/fetch events, so the new /admin/git/top-repos histogram is materially inaccurate for its stated purpose. Gate increments on the upload-pack command type (count fetch/v1 negotiation, skip ls-refs).
Useful? React with 👍 / 👎.
1005703 to
a3d932b
Compare
Track per-repository pack-fetch counts in the metadata DB so callers can
identify the most frequently cloned repos served by the proxy.
The new RepoCounts type wraps a metadatadb.IntMap[string] keyed by
"<upstream-url>|<YYYY-MM-DD>". Each real fetch increments the bucket
for today (UTC). Daily bucketing makes time-windowed queries trivial
("top repos last 7 days") while a periodic reaper keeps the namespace
bounded by deleting entries older than 90 days.
Counted events:
- POST /git-upload-pack containing a protocol v1 payload, or v2
command=fetch.
- Excluded: GET /info/refs (every fetch's discovery probe, ls-remote,
and the proxy's own staleness check) and v2 command=ls-refs (the v2
equivalent of info/refs). RequestCountsAsFetch buffers and replays
the body; gzip Content-Encoding is decoded for inspection.
The increment runs after cloneManager.GetOrCreate has accepted the
upstream URL, so unauthenticated callers cannot bloat the keyspace
with arbitrary URLs.
The reaper is a no-op short-circuit when the namespace is empty,
returns the count of deleted entries, and only logs when something
was actually pruned.
Wiring:
- internal/metadatadb: new NamespaceProvider type for lazy resolution.
- internal/strategy/git: New/Register accept a NamespaceProvider; nil-safe.
- internal/config: Load takes a setMetadataStore callback so callers can
obtain the constructed Store before strategies are built.
- cmd/cachewd: declares an atomic.Pointer[metadatadb.Store] populated by
Load and read by the git strategy's namespace provider closure.
No external surface is added — the histogram is exposed through the
RepoCounts API for in-process consumers.
a3d932b to
ef5f4ef
Compare
Tracks per-repository pack-fetch counts in the metadata DB so callers can identify the most frequently cloned repos served by the proxy. Exposes a
RepoCountsAPI; no HTTP surface.metadatadb.IntMap[string]keyed by<upstream-url>|<YYYY-MM-DD>. Daily buckets keep windowed queries trivial; a daily reaper drops entries older than 90 days, short-circuits on an empty namespace, returns the deleted count, and only logs when something was actually pruned.POST /git-upload-packand protocol v2command=fetch. Excluded:GET /info/refs(every fetch's discovery probe,ls-remote, and the proxy's own staleness check) and v2command=ls-refs(the v2 equivalent ofinfo/refs).RequestCountsAsFetchbuffers the body, decodes gzip when present, and replays it for downstream handlers.cloneManager.GetOrCreateaccepts the upstream URL, so unauthenticated callers cannot bloat the keyspace with arbitrary URLs.metadatadb.NamespaceProvider,git.New/Registeraccept it (nil-safe),config.Loadgains asetMetadataStorecallback, andcmd/cachewdholds the store in anatomic.Pointerso the provider closure can resolve the"git"namespace at strategy-construction time.