Skip to content

RFC: subtree-sync skills from databricks-agent-skills/experimental#530

Draft
jamesbroadhead wants to merge 3 commits into
databricks-solutions:mainfrom
jamesbroadhead:sync-skills-from-das
Draft

RFC: subtree-sync skills from databricks-agent-skills/experimental#530
jamesbroadhead wants to merge 3 commits into
databricks-solutions:mainfrom
jamesbroadhead:sync-skills-from-das

Conversation

@jamesbroadhead
Copy link
Copy Markdown

Summary (RFC / draft)

Proposes a git subtree-based back-link from this repo's databricks-skills/
to databricks/databricks-agent-skills/experimental/.

Paired with databricks/databricks-agent-skills#73
which establishes the experimental/ directory on the d-a-s side.

What changes

  • databricks-skills/imported/ — new directory, added as a git subtree
    of the experimental-only branch of databricks-agent-skills. Contains the
    25 imported skills (the 26 a-d-k skills minus the dropped databricks-model-serving).
  • .github/workflows/sync-skills-from-das.yml — weekly cron + manual
    dispatch. Runs git subtree pull, opens an auto-PR if there is drift.
  • databricks-skills/SYNC.md — operator runbook + mechanism explainer +
    trade-offs vs submodule / rsync / fork.
  • databricks-skills/README.md — banner block pointing at imported/ and
    SYNC.md, with explicit "do not edit imported/ here" note.

How it works

git subtree can't pull a subdirectory of a remote repo directly — the remote
needs to publish a branch whose root tree is what you want. So:

  • On d-a-s: a workflow runs git subtree split --prefix=experimental --branch=experimental-only
    after each push to main and force-pushes the result. The branch's root is
    the contents of experimental/.
  • On this repo: git subtree pull --prefix=databricks-skills/imported <d-a-s-url> experimental-only --squash
    brings drift in, recorded as a squashed merge commit referencing the
    upstream SHA. git log --grep "Squashed 'databricks-skills/imported/'"
    shows the full sync history; git blame on an imported skill points back
    to its upstream commit.

Status

Draft / RFC. The subtree was added from a one-shot preview branch
(experimental-only-preview)
pushed manually to d-a-s for this demo. Before merge:

  1. d-a-s side — land #73
    first so experimental/ exists on main.
  2. d-a-s side — separate follow-up PR there to add the
    experimental-only auto-publish workflow.
  3. Here — swap the workflow + SYNC.md to reference experimental-only
    (not -preview), re-run subtree-add against the stable branch, and force
    the squash commits in this PR to be replaced.

Open questions

  1. What about the 26 legacy top-level skills? This RFC adds imported/ as
    a new directory; the existing skills at databricks-skills/<name>/ stay
    put. Long-term we probably want all skills to live under imported/ and
    for install_skills.sh to read from one place — but that migration is
    out of scope here.
  2. Conflict policy. If a-d-k locally edits a file under imported/,
    git subtree pull will conflict. SYNC.md says "don't edit" but doesn't
    enforce. Should we add a pre-commit hook that rejects edits under
    imported/?
  3. Sync cadence. Weekly + manual dispatch. Daily? Or trigger on upstream
    push via repository_dispatch?
  4. Auto-merge. Auto-PR is currently for-review-only. Would we want it
    to auto-merge if CI is green?

Alternatives considered

  • rsync from a fresh clone — simpler workflow, no experimental-only
    branch needed. Trade-off: loses the squashed-merge audit trail and
    git log/blame provenance.
  • git submodule — can't reference a subdirectory of the target repo,
    and end users would need git submodule update to see skill content.
  • Hard fork — diverges silently; defeats the back-link goal.

SYNC.md captures these trade-offs in more detail.

This pull request and its description were written by Claude.

git-subtree-dir: databricks-skills/imported
git-subtree-split: b8781b713f0c80e7e827288e10b3e5db692f6084
Proposes a subtree-based back-link from databricks-solutions/ai-dev-kit
to databricks/databricks-agent-skills/experimental/, replacing the
current ad-hoc copy.

- `databricks-skills/imported/` is added as a `git subtree` of the
  `experimental-only` branch of databricks-agent-skills (a branch whose
  root tree mirrors experimental/, produced via `git subtree split` on
  that side after every push to main).
- `.github/workflows/sync-skills-from-das.yml` runs weekly (and on
  manual dispatch), pulls the subtree, and opens an auto-PR on drift.
- `databricks-skills/SYNC.md` documents the mechanism, manual-sync
  command, and trade-offs vs submodule / rsync / fork.
- `databricks-skills/README.md` gets a banner pointing at imported/.

Paired with databricks/databricks-agent-skills#73 (which establishes
the experimental/ directory on d-a-s side) and a follow-up PR there to
add the split-publish workflow that maintains experimental-only.

For this RFC the subtree was added from a one-shot preview branch
(experimental-only-preview) pushed manually to d-a-s. The follow-up
will replace that with the auto-maintained experimental-only branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant