Skip to content

feat: add support for GIT LFS checkout#3909

Open
lizrabuya wants to merge 7 commits into
mainfrom
feat/sup-6529/lfs-checkout
Open

feat: add support for GIT LFS checkout#3909
lizrabuya wants to merge 7 commits into
mainfrom
feat/sup-6529/lfs-checkout

Conversation

@lizrabuya
Copy link
Copy Markdown
Contributor

@lizrabuya lizrabuya commented May 11, 2026

Description

Add configuration and environment variables when the backend sets BUILDKITE_GIT_LFS_ENABLED=true in the job environment. When BUILDKITE_GIT_LFS_ENABLED=true is set, the agent will check if the git-lfs binary is installed and proceed to perform a git lfs fetch + git lfs checkout.

Changes

Configuration & CLI (clicommand/bootstrap.go, internal/job/config.go)

  • Added a new GitLFSEnabled bool field to ExecutorConfig, backed by env var BUILDKITE_GIT_LFS_ENABLED
  • Exposed it as a --git-lfs-enabled CLI flag on the bootstrap command with usage description
  • Wired the flag through from BootstrapConfig into ExecutorConfig

Checkout behaviour (internal/job/checkout.go)

  • Added a fail-fast binary check: if GitLFSEnabled is true and git-lfs is not on PATH, the checkout fails immediately with a descriptive error before any fetch/checkout work
  • Added git lfs install --local after clone/clean but before fetch, so the LFS filter is registered in the local repo config
  • Added a call to gitLFSFetchCheckout after the main git checkout (and after submodule updates) to explicitly pull down and materialise LFS objects

LFS fetch helper (internal/job/git.go)

  • Added gitLFSFetchCheckout, which runs git lfs fetch then git lfs checkout, wrapping each failure with a distinct error prefix ("git lfs fetch: ..." / "git lfs checkout: ...") so the failing step is identifiable from logs alone
  • gitLFSFetchCheckout supports opt-in retry: 5 attempts with subsecond exponential backoff + jitter, returning a *gitError{Type: gitErrorLFS, WasRetried: true} on exhaustion so the outer checkout retrier doesn't double-retry the same failure

Smudge bypass (internal/job/executor.go)

  • Set GIT_LFS_SKIP_SMUDGE=1 in setUp so LFS objects are never downloaded implicitly during git checkout; the explicit gitLFSFetchCheckout call handles download instead

Tests (internal/job/checkout_test.go)

  • Added TestDefaultCheckoutPhase_GitLFS covering four scenarios: LFS disabled, LFS enabled with binary present, LFS enabled with binary missing (PATH restricted via t.Setenv), and LFS enabled with a failing git lfs command (fake git-lfs script that exits 1)

Windows test caveats

Two of the LFS error-path subtests (LFS enabled git lfs command fails and LFS enabled git lfs fetch fails) are skipped on Windows. They cannot run there for a fundamental reason, not flakiness:

  • These tests work by overriding PATH to point at a fake git-lfs script that exits non-zero, so we can assert our error-wrapping behaviour.
  • Git for Windows ships its own git-lfs.exe inside GIT_EXEC_PATH (the Git installation's libexec/git-core directory). When git resolves a subcommand like git lfs ..., it looks in GIT_EXEC_PATH before falling back to PATH.
  • That means our PATH override is bypassed entirely on Windows — the real bundled git-lfs.exe runs, succeeds, and the test fails because we expected our fake to return an error.
  • The Go code being exercised (error wrapping in gitLFSFetchCheckout and git lfs install --local failure handling) is platform-agnostic, so the Linux runs give us the real coverage. Skipping on Windows is a test-harness limitation, not a behavioural gap.

Additionally, the two happy-path subtests (LFS disabled, LFS enabled binary present) use os.MkdirTemp + a best-effort RemoveAll in t.Cleanup rather than t.TempDir(). On Windows, git's child processes (credential helpers, the git-lfs filter-process) can keep file handles open past their parent's exit. t.TempDir()'s cleanup is strict and fails the test on unlinkat ... being used by another process errors; the best-effort variant lets the test pass while still cleaning up what it can.

Testing

  • Tests have run locally (with go test ./...). Buildkite employees may check this if the pipeline has run automatically.
  • Code is formatted (with go tool gofumpt -extra -w .)

Disclosures / Credits

The main function changes were implemented by me based on the recommendation from the research done for this function. I used Claude to review the code and to assist with writing the tests.

@lizrabuya lizrabuya changed the title Enable git-lfs checkout Add Git LFS Config and Environment Variables May 11, 2026
@lizrabuya lizrabuya changed the base branch from main to feat/git-checkout-features May 11, 2026 06:57
@lizrabuya lizrabuya changed the title Add Git LFS Config and Environment Variables Add Git LFS support May 11, 2026
@lizrabuya lizrabuya force-pushed the feat/sup-6529/lfs-checkout branch from f9e4186 to 255aa11 Compare May 11, 2026 07:10
@lizrabuya lizrabuya changed the title Add Git LFS support Add support for GIT LFS checkout May 11, 2026
@lizrabuya lizrabuya marked this pull request as ready for review May 22, 2026 04:48
@lizrabuya lizrabuya requested review from a team as code owners May 22, 2026 04:48
Comment thread internal/job/git.go
Comment thread internal/job/checkout_test.go
@ozdenyilmaz
Copy link
Copy Markdown

@lizrabuya could you ask for a review in #ask-agents or in our project channel tagging Josh/Ming once you address Dahtey's comments?

@lizrabuya lizrabuya requested a review from dahtey-bk June 5, 2026 03:23
@lizrabuya lizrabuya changed the base branch from feat/git-checkout-features to main June 5, 2026 04:33
@lizrabuya lizrabuya force-pushed the feat/sup-6529/lfs-checkout branch 2 times, most recently from 242cb29 to 9a649dd Compare June 5, 2026 06:07
@lizrabuya lizrabuya changed the title Add support for GIT LFS checkout feat: add support for GIT LFS checkout Jun 5, 2026
@lizrabuya lizrabuya added the feature New user-facing feature! label Jun 5, 2026
@quinn-diesel
Copy link
Copy Markdown
Contributor

going to tag and throw @buildsworth-bk in here for a review

Copy link
Copy Markdown

@buildsworth-bk-app buildsworth-bk-app Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds opt-in Git LFS support to the default checkout: a --git-lfs-enabled flag, GIT_LFS_SKIP_SMUDGE=1 so checkout writes pointer files rather than downloading inline, and an explicit git lfs fetch / git lfs checkout after checkout with its own bounded retrier. The double-retry concern from the earlier round looks resolved — exhausting the inner retrier returns WasRetried: true, which breaks the outer checkout retrier, so a persistent LFS failure fails once rather than being re-attempted ~30 times.

Two things to confirm and one small cleanup, all inline. The one I'd most want an answer on is the env var name: the SUP-6529 issue specifies BUILDKITE_GIT_LFS, but this reads BUILDKITE_GIT_LFS_ENABLED — if those don't match what the backend emits, the feature silently never activates.

The PR has no risk: label, so I'm leaving this as a comment rather than an approval.

Want to dig deeper? The full session log is attached to this Buildkite build. Download the session file and open a new pi session with it:

Download the buildsworth logs from build 518, then answer my questions about the findings.

Comment thread clicommand/bootstrap.go
cli.BoolFlag{
Name: "git-lfs-enabled",
Usage: "Enable Git LFS object download during checkout (default: false)",
EnvVar: "BUILDKITE_GIT_LFS_ENABLED",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: The SUP-6529 issue specifies the env var as BUILDKITE_GIT_LFS ("the backend sets BUILDKITE_GIT_LFS=true"), but this reads BUILDKITE_GIT_LFS_ENABLED. The agent only enables LFS when this name matches exactly what the backend writes into the job environment, so if the backend still emits BUILDKITE_GIT_LFS the feature would silently never run. Has the backend side been confirmed to send BUILDKITE_GIT_LFS_ENABLED?

Comment thread internal/job/checkout.go
Comment on lines +943 to +948
// Fail fast before any git work if git-lfs is required but missing.
if e.GitLFSEnabled {
if _, err := exec.LookPath("git-lfs"); err != nil {
return fmt.Errorf("BUILDKITE_GIT_LFS_ENABLED=true but git-lfs binary is not found on PATH: %w", err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: Two things about this fail-fast check. First, the comment says "before any git work", but the mirror update and clone earlier in defaultCheckoutPhase have already run by the time we reach here. Second — and more impactful — the returned error is a plain fmt.Errorf, not a *gitError, so the outer checkout retrier in checkout() doesn't recognise it: errors.As(err, &errGit) fails and it falls through to the default: branch, which removes the checkout dir and retries. A missing git-lfs binary is deterministic, so this re-clones the repo on every attempt (up to ~6×) before finally failing — the opposite of fail-fast. Moving the check above the mirror/clone block and returning something that breaks the outer retrier (e.g. a *gitError{..., WasRetried: true}, the way gitLFSFetchCheckout does on exhaustion) would let it fail once, immediately.

Comment thread internal/job/checkout.go
// disk rather than downloading objects inline. Intentionally not
// restored — git lfs checkout materialises files from cache without
// triggering the smudge filter.
e.shell.Env.Set("GIT_LFS_SKIP_SMUDGE", "1")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: GIT_LFS_SKIP_SMUDGE=1 is already set unconditionally in setUp() (executor.go:921), which runs before the checkout phase and is never unset, so this re-set looks redundant — and the "Intentionally not restored" note reads a little oddly since there's no earlier value being restored here. Is there a path where the setUp() setting wouldn't already be in effect at this point? If not, this block could just be the git lfs install --local call.

Copy link
Copy Markdown

@buildsworth-bk-app buildsworth-bk-app Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interesting part here is the retry interaction. gitLFSFetchCheckout runs its own 5-attempt budget and tags an exhausted failure with WasRetried + gitErrorLFS, so the outer checkout retrier breaks (and gitErrorLFS falls through to return err without wiping the checkout dir) rather than multiplying this into ~30 attempts. I traced that path and it holds up. The fail-fast git-lfs lookup and the split fetch/checkout with distinct error prefixes are sensible.

Three non-blocking notes inline: a redundant GIT_LFS_SKIP_SMUDGE set, an env-var naming question worth confirming against the backend, and an error-path test that pays the full retry backoff. The PR also has no risk: label, so it falls outside AI-only approval.

Want to dig deeper? The full session log is attached to this Buildkite build. Download the session file and open a new pi session with it:

Download the buildsworth logs from build 519, then answer my questions about the findings.

Comment thread internal/job/config.go
GitSubmodules bool `env:"BUILDKITE_GIT_SUBMODULES"`

// Whether to enable Git LFS operations during checkout
GitLFSEnabled bool `env:"BUILDKITE_GIT_LFS_ENABLED"`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: the SUP-6529 spec uses BUILDKITE_GIT_LFS throughout, but this field (and the CLI flag / backend, per the PR description) uses BUILDKITE_GIT_LFS_ENABLED. Since the backend is what sets this var in the job env, a name mismatch would silently no-op the feature rather than erroring. Has the rename been confirmed end-to-end with the backend, and the spec updated to match?

Comment thread internal/job/checkout.go
// disk rather than downloading objects inline. Intentionally not
// restored — git lfs checkout materialises files from cache without
// triggering the smudge filter.
e.shell.Env.Set("GIT_LFS_SKIP_SMUDGE", "1")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: GIT_LFS_SKIP_SMUDGE=1 is already set unconditionally in setUp() (executor.go:921), which always runs before the checkout phase, so this set is redundant. The comment here reads as though it's doing the work, which could lead a future reader to think the setUp() one is removable (or vice versa). Probably cleaner to drop this line and rely on the setUp() set, or note explicitly that it's belt-and-braces.

wantErr: "installing git lfs filter",
},
{
name: "LFS enabled git lfs fetch fails",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking: this subtest reaches gitLFSFetchCheckout through the production path with Retry: true, so it burns the full 5-attempt backoff (~16s per the comment in git.go) on every run. Since the assertion is really about error wrapping ("git lfs fetch"), you could call gitLFSFetchCheckout directly with Retry: false to get the same coverage without the wait, and keep any retried-path assertions to a dedicated test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New user-facing feature!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants