Add SEP: Contract Verification Attestations#1951
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds a new draft SEP describing Soroban contract build claims, content-addressed distribution, and signed third-party verification attestations to enable reproducible verification and publishable verdicts.
Changes:
- Introduces a canonical “build claim” concept and claim-hash identifier derived from XDR.
- Defines a signed verification record format and signature domain separation.
- Proposes a (currently tentative) canonical source-archive construction for reproducible
sourceHash.
|
|
||
| `sourceHash` is the SHA-256 of a *canonical source archive*: a single file produced from the source tree in a way any verifier can reproduce byte-for-byte. Because the hash is load-bearing, the archive construction MUST be deterministic; an archive that varies by who built it, when, or with which tool version cannot serve as a content address. | ||
|
|
||
| > **This section is not yet finalized.** The construction below is a starting point. The ecosystem needs to converge on one exact, tested recipe — and ideally pin the archiving tool — before this is normative. |
| Given a fixed commit, `git archive` is a reasonable starting point because file contents and per-entry mtimes are determined by the commit object rather than by wall-clock time. Several details still break reproducibility and MUST be controlled: | ||
|
|
||
| - **gzip non-determinism.** gzip embeds a timestamp and the original filename in its header, so the same tar compresses to different bytes each run. Either produce the gzip with `--no-name`/`-n` (`git archive --format=tar <commit> | gzip -n`), or — preferably — define `sourceHash` over the *uncompressed* tar and treat compression as transport only, sidestepping gzip entirely. | ||
| - **git version drift.** `git archive` output (pax/extended headers, the embedded commit-id comment, default permissions and ordering) can differ across git versions. The recipe SHOULD pin a git version, or the archiving tool SHOULD be part of the pinned build image so the same tool always produces the archive. | ||
| - **attribute and line-ending filters.** `.gitattributes` directives (`text=auto`, `export-subst`, `export-ignore`) rewrite or drop content during archiving. Verifiers MUST apply identical attribute handling; setting `core.autocrlf=false` and fixing or disabling export filters keeps the bytes stable. |
|
|
||
| There is a relevant discussion here (TODO). | ||
|
|
||
| As an alternate proposal for this section: if a canonical, reproducible git archive tools is deemed too complex, the original builder can run this archive tool and push the resulting file to IPFS and later reference that file by sha256 hash. This only requires a robust archiving tool that works, not that it must produce byte-identical output on multiple runs on different machines and versions. |
| ### 1. The build claim | ||
|
|
||
| A build claim captures *what was built* (the claimed wasm) together with *how and from what* (the build-environment fields and the source hash), in a form whose serialization is canonical so that its hash is reproducible across implementations. | ||
|
|
| ## Preamble | ||
|
|
||
| ``` | ||
| SEP: 0000 |
| The signing payload is domain-separated to prevent a verification signature from being valid in any other context: | ||
|
|
||
| ``` | ||
| payload = SHA-256( "SEP-XXXX-verification:v1" || claimHash || result-as-u32-BE ) |
|
|
||
| There is a relevant discussion here (TODO). | ||
|
|
||
| As an alternate proposal for this section: if a canonical, reproducible git archive tools is deemed too complex, the original builder can run this archive tool and push the resulting file to IPFS and later reference that file by sha256 hash. This only requires a robust archiving tool that works, not that it must produce byte-identical output on multiple runs on different machines and versions. |
|
|
||
| ### 4. Signed verification | ||
|
|
||
| A verification is the cheap, publishable result of an attempt to reproduce a claim. It is small by design: it carries the claim hash, not the claim, so it is cheap to publish to the blockchain, while the full claim data can be stored off-chain. Consumers can scan many verifications and resolve only the claims they care about, as the hash -> claim lookup is easily resolvable by any off-chain client. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b70ff8e40d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| The resolution is that vocabulary fields are stored as plain data and the wasm's embedded meta is *read*, never re-synthesized: | ||
|
|
||
| 1. Fetch the original wasm by `claimedWasmHash` (e.g. from RPC, section 2) and read its `contractmetav0` section (per SEP-46). | ||
| 2. From that meta, collect the build-environment fields (`bldimg`, `bldopt`) and the source-hash entry. Union the build-environment fields with any supplied externally (registry, verification service, or out-of-band), with external fields filling gaps but never overriding on-wasm values for the same `(key, val)`. |
There was a problem hiding this comment.
Define the source-hash meta key before reading it
For wasms that carry the companion SEP-58 metadata, there is no source-hash entry to collect: SEP-58 defines source_repo, source_rev, tarball_url, and tarball_sha256, while this draft never defines a new on-wasm key for sourceHash. In this reconstruction path, verifiers either cannot derive the claim from existing on-wasm metadata or will invent incompatible mappings, so the claim hash will not be interoperable unless the exact meta key or tarball_sha256 mapping is specified here.
Useful? React with 👍 / 👎.
| The resolution is that vocabulary fields are stored as plain data and the wasm's embedded meta is *read*, never re-synthesized: | ||
|
|
||
| 1. Fetch the original wasm by `claimedWasmHash` (e.g. from RPC, section 2) and read its `contractmetav0` section (per SEP-46). | ||
| 2. From that meta, collect the build-environment fields (`bldimg`, `bldopt`) and the source-hash entry. Union the build-environment fields with any supplied externally (registry, verification service, or out-of-band), with external fields filling gaps but never overriding on-wasm values for the same `(key, val)`. |
There was a problem hiding this comment.
Prevent external fields from duplicating bldimg
When a wasm already contains a bldimg and an external registry supplies a different bldimg, this same (key, val) rule does not treat it as an override, so both values get unioned into the claim. SEP-58 only makes bldopt repeatable, so a claim with two build images leaves the rebuild environment ambiguous and can make verifiers compute or execute different results; the merge rule should reject or ignore external values for any key that is already present on-wasm, except explicitly repeatable fields.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7c4872d2a6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ## Preamble | ||
|
|
||
| ``` | ||
| SEP: 0000 |
There was a problem hiding this comment.
Leave the draft SEP number unassigned
For a newly submitted SEP draft, ecosystem/README.md lines 163-168 and sep-template.md require the preamble to use SEP: To Be Assigned rather than a self-assigned placeholder number. Keeping 0000 here makes the draft non-conformant with the repository's SEP intake process and risks the placeholder being treated as the proposal number if the draft is merged before maintainer assignment.
Useful? React with 👍 / 👎.
|
I've changed this PR to Draft and locked the conversation so that conversation is directed into the discussion at #1952. |
I propose a new SEP (Draft), defining a normalized build claim, a cheap signed verification result over that claim, and a content-addressing convention for the claims, source archives, and wasm bytes involved in verifying a Soroban contract.
The SEP adds three layers on top of the build-environment vocabulary in SEP-0058:
It also specifies how a verifier reconstructs a claim from a wasm's embedded meta and re-embeds that meta byte-for-byte on rebuild.
Rendered markdown of this SEP
Why
SEP-0058 makes a build reproducible, but says nothing about what a verifier does with the result. Two honest verifiers have no shared identifier for "the same build", no cheap way to publish a result others can trust selectively, and no integrity guarantee on the source bytes a recorded URL points at.
This SEP closes those gaps and is complementary to SEP-55 (CI attestations) and SEP-0058 (rebuild vocabulary): a contract can carry meta supporting all three, and consumers pick whichever verification path fits their threat model.
Discussion: https://github.com/orgs/stellar/discussions/1952