docs(#1458): SparkAdapter Codec Protocol spec + explainer#188
Conversation
Spec-first pair for the Renderable Protocol landing in DataJoint 2.3 (per user direction 2026-06-23, bringing T3.2 back into 2.3 scope). New files: - src/reference/specs/renderable.md — normative spec for the Renderable Protocol. Covers signature, return-value shape constraints (primitives / lists / dicts mapping to Spark ArrayType / StructType / MapType), why the contract is a Protocol rather than an abstract method on Codec, eligibility detection via isinstance, out-of-scope items, and two worked example codec implementations (FloatArrayCodec, Image2DCodec, PointWithLabelCodec). - src/explanation/renderable-codecs.md — explainer. Covers the Bronze/Silver layer model (CDC mirror vs typed silver layer), why <blob@> is bronze-only, what typed renderable codecs are, the design rationale for the Protocol pattern (smaller OSS surface, cleaner opt-in, no churn for existing plugins, structural typing), what's out of scope, and a decision guide for choosing codecs in a new pipeline. Nav entries added: - Reference > Specifications > Type System > Renderable Codec Protocol - Concepts > Storage > Renderable Codecs Implementation (against this spec) follows in datajoint-python; the addition is small (~10 lines: a runtime_checkable Protocol declaration in src/datajoint/rendering.py, re-exported as dj.Renderable). Examples use core DataJoint types (float64, int32) per project convention. Cross-links to codec-api.md (the base Codec interface that renderable codecs extend by composition, not inheritance).
Renderable conflicts with the broader notion of graphically renderable field types and is too generic for an interface targeted specifically at Spark / Lakehouse Sync. Rename for clarity: - Class: Renderable → SparkAdapter (parallels StorageAdapter) - Method: render_spark → to_spark (matches pandas/Arrow conventions like to_pandas, to_arrow, __dataframe__) - Spec file: renderable.md → spark-adapter.md - Explainer: renderable-codecs.md → spark-adapters.md - Nav entries updated in mkdocs.yaml
MilagrosMarin
left a comment
There was a problem hiding this comment.
Verified against the shipped code in #1472. The Protocol signature, module path, dj.SparkAdapter re-export, and blob/hash non-eligibility all match the implementation exactly, and the line-87 caveat about @runtime_checkable checking method existence but not signature is exactly the observation I flagged independently on #1472 — good that this spec surfaces it. The design rationale for Protocol vs. abstract method reads well, and the out-of-scope enumeration is precise.
One thing worth a sanity check before merge: the FloatArrayCodec.get_dtype example on line 112 returns "<hash@>" when is_store=True, but the built-in BlobCodec.get_dtype on master returns "<hash>" (no @ — the @ modifier gets composed by the framework at declaration time, not returned by the codec). If the intent is to mirror built-in conventions, plugin authors copy-pasting the example may see divergent behavior. Someone deeper in the codec-chain semantics should confirm.
Two cosmetic residuals from the Renderable → SparkAdapter rename: the PR summary table still lists renderable.md / renderable-codecs.md (files are spark-adapter.md / spark-adapters.md), and the "Stays draft until the matching implementation PR opens" framing is out of date since #1472 is now open and approved. Nav placement, cross-links, and the explainer's Bronze/Silver framing are all clean.
Approving.
…@>) (#192) The get_dtype example returned "<hash@>" for the store branch, but codecs return the base dtype "<hash>" — the framework composes the @ store modifier at declaration time. Matches built-in BlobCodec.get_dtype. Flagged by @MilagrosMarin in review of #188.
Summary
Spec-first pair for the Renderable Codec Protocol landing in DataJoint 2.3 (#1458). Brought into 2.3 scope per user direction 2026-06-23 (was previously deferred to 2.4).
Why spec-first
The Protocol itself is tiny (~10 lines: a `@runtime_checkable Protocol` declaration). The design conversation in the issue body settled the shape after #1457 (the earlier abstract-method-on-Codec framing) was rejected. Locking the spec now gives:
Marked draft
Stays draft until the matching implementation PR opens in `datajoint-python` — same pattern as the provenance trinity spec (#183) before #1471 landed against it.
Sequencing
Independent of the provenance trinity (no code overlap). Can land in parallel with T2.2 implementation work.
Test plan