Skip to content

docs(#1458): SparkAdapter Codec Protocol spec + explainer#188

Merged
dimitri-yatsenko merged 2 commits into
mainfrom
feat/1458-renderable-spec
Jul 1, 2026
Merged

docs(#1458): SparkAdapter Codec Protocol spec + explainer#188
dimitri-yatsenko merged 2 commits into
mainfrom
feat/1458-renderable-spec

Conversation

@dimitri-yatsenko

Copy link
Copy Markdown
Member

Summary

Spec-first pair for the Renderable Codec Protocol landing in DataJoint 2.3 (#1458). Brought into 2.3 scope per user direction 2026-06-23 (was previously deferred to 2.4).

File Role
`src/reference/specs/renderable.md` (new) Normative spec: Protocol signature, return-value shape constraints (Spark-native: primitives / lists / dicts), why it's a Protocol vs. abstract method, eligibility detection via `isinstance`, out-of-scope items, worked codec examples.
`src/explanation/renderable-codecs.md` (new) Explainer: Bronze/Silver layer model, why `<blob@>` is bronze-only, design rationale (smaller OSS surface, cleaner opt-in, no plugin churn, structural typing), decision guide for choosing codecs in a new pipeline.
`mkdocs.yaml` Nav entries under Reference → Specifications → Type System, and Concepts → Storage.

Why spec-first

The Protocol itself is tiny (~10 lines: a `@runtime_checkable Protocol` declaration). The design conversation in the issue body settled the shape after #1457 (the earlier abstract-method-on-Codec framing) was rejected. Locking the spec now gives:

  • The downstream consumer (`datajoint-databricks`) a stable contract to build the silver-layer publish pipeline against.
  • Codec plugin authors (current and future) a clear opt-in target.
  • The implementation PR a small, well-scoped diff to land against.

Marked draft

Stays draft until the matching implementation PR opens in `datajoint-python` — same pattern as the provenance trinity spec (#183) before #1471 landed against it.

Sequencing

Independent of the provenance trinity (no code overlap). Can land in parallel with T2.2 implementation work.

  1. This PR (spec) — review while implementation is drafted.
  2. datajoint-python implementation PR — adds `src/datajoint/rendering.py` with the Protocol, re-exports as `dj.Renderable`, adds unit tests.
  3. Flip this PR from draft to ready alongside the implementation PR.

Test plan

  • `mkdocs serve` renders both pages under the new nav groups
  • Cross-links resolve (codec-api.md, custom-codecs.md explainer, the issue, #1457)
  • Examples use core DataJoint types per project convention
  • Reviewers can sketch a plugin codec from the worked examples without ambiguity

Spec-first pair for the Renderable Protocol landing in DataJoint 2.3
(per user direction 2026-06-23, bringing T3.2 back into 2.3 scope).

New files:

- src/reference/specs/renderable.md — normative spec for the Renderable
  Protocol. Covers signature, return-value shape constraints (primitives /
  lists / dicts mapping to Spark ArrayType / StructType / MapType), why
  the contract is a Protocol rather than an abstract method on Codec,
  eligibility detection via isinstance, out-of-scope items, and two
  worked example codec implementations (FloatArrayCodec, Image2DCodec,
  PointWithLabelCodec).

- src/explanation/renderable-codecs.md — explainer. Covers the
  Bronze/Silver layer model (CDC mirror vs typed silver layer), why
  <blob@> is bronze-only, what typed renderable codecs are, the design
  rationale for the Protocol pattern (smaller OSS surface, cleaner
  opt-in, no churn for existing plugins, structural typing), what's
  out of scope, and a decision guide for choosing codecs in a new
  pipeline.

Nav entries added:
- Reference > Specifications > Type System > Renderable Codec Protocol
- Concepts > Storage > Renderable Codecs

Implementation (against this spec) follows in datajoint-python; the
addition is small (~10 lines: a runtime_checkable Protocol declaration
in src/datajoint/rendering.py, re-exported as dj.Renderable).

Examples use core DataJoint types (float64, int32) per project convention.
Cross-links to codec-api.md (the base Codec interface that renderable
codecs extend by composition, not inheritance).
Renderable conflicts with the broader notion of graphically renderable
field types and is too generic for an interface targeted specifically at
Spark / Lakehouse Sync. Rename for clarity:

- Class: Renderable → SparkAdapter (parallels StorageAdapter)
- Method: render_spark → to_spark (matches pandas/Arrow conventions like
  to_pandas, to_arrow, __dataframe__)
- Spec file: renderable.md → spark-adapter.md
- Explainer: renderable-codecs.md → spark-adapters.md
- Nav entries updated in mkdocs.yaml
@dimitri-yatsenko dimitri-yatsenko changed the title docs(#1458): Renderable Codec Protocol spec + explainer docs(#1458): SparkAdapter Codec Protocol spec + explainer Jun 26, 2026

@MilagrosMarin MilagrosMarin left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified against the shipped code in #1472. The Protocol signature, module path, dj.SparkAdapter re-export, and blob/hash non-eligibility all match the implementation exactly, and the line-87 caveat about @runtime_checkable checking method existence but not signature is exactly the observation I flagged independently on #1472 — good that this spec surfaces it. The design rationale for Protocol vs. abstract method reads well, and the out-of-scope enumeration is precise.

One thing worth a sanity check before merge: the FloatArrayCodec.get_dtype example on line 112 returns "<hash@>" when is_store=True, but the built-in BlobCodec.get_dtype on master returns "<hash>" (no @ — the @ modifier gets composed by the framework at declaration time, not returned by the codec). If the intent is to mirror built-in conventions, plugin authors copy-pasting the example may see divergent behavior. Someone deeper in the codec-chain semantics should confirm.

Two cosmetic residuals from the Renderable → SparkAdapter rename: the PR summary table still lists renderable.md / renderable-codecs.md (files are spark-adapter.md / spark-adapters.md), and the "Stays draft until the matching implementation PR opens" framing is out of date since #1472 is now open and approved. Nav placement, cross-links, and the explainer's Bronze/Silver framing are all clean.

Approving.

@dimitri-yatsenko dimitri-yatsenko merged commit a63a00f into main Jul 1, 2026
3 checks passed
dimitri-yatsenko added a commit that referenced this pull request Jul 1, 2026
…@>) (#192)

The get_dtype example returned "<hash@>" for the store branch, but codecs
return the base dtype "<hash>" — the framework composes the @ store modifier
at declaration time. Matches built-in BlobCodec.get_dtype. Flagged by
@MilagrosMarin in review of #188.
@dimitri-yatsenko dimitri-yatsenko deleted the feat/1458-renderable-spec branch July 2, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants