Skip to content

spec: ECSM#655

Open
erik-3milabs wants to merge 43 commits into
spec/mainfrom
spec/ecsm
Open

spec: ECSM#655
erik-3milabs wants to merge 43 commits into
spec/mainfrom
spec/ecsm

Conversation

@erik-3milabs

Copy link
Copy Markdown
Collaborator

This PR introduces the ECSM (Elliptic Curve Scalar Multiplication) accelerator.

@erik-3milabs erik-3milabs requested a review from RobinJadoul June 9, 2026 09:43
@erik-3milabs erik-3milabs self-assigned this Jun 9, 2026
@erik-3milabs erik-3milabs added the spec Updates and improvements to the spec document label Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

Codex Code Review

Findings

  • High: inactive EC_SCALAR rows emit SERVE_K tokens. In spec/src/ec_scalar.toml, the recursive SERVE_K interaction uses multiplicity = ["not", "last_limb"], independent of μ. Since line 99 forces last_limb = 0 when μ = 0, every inactive/padding row emits a free SERVE_K token. Those can be consumed by active scalar rows and produce BIT tokens not rooted in the ECSM request’s addr_k, breaking the binding between k and the double-and-add sequence. Gate this by activity, e.g. μ * (1 - last_limb) or μ - last_limb with the existing implication.

  • High: ECDAS lambda carry recurrence uses c0[0] instead of c0[i]. In spec/src/ecdas.toml, the i = 1..63 lambda constraint starts with 2^8 * c0[0]. The rendered constraint and carry pattern require 2^8 * c0[i]. As written, most lambda carry limbs are not tied into the recurrence correctly, so the modular relation for λ is underconstrained for higher limbs. This can let invalid EC add/double steps satisfy the spec.

No other concrete PR-diff issues found in this pass.

Comment thread spec/ecsm.typ Outdated
Comment thread spec/ecsm.typ
Comment thread spec/src/ecdas.toml Outdated
Comment thread spec/src/ecsm.toml
ref = "ec:a:addr_k_alignment"

[[assumptions]]
desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low — Inconsistent assumption bound for addr_k

k is 32 bytes, read in four 8-byte chunks at offsets 0, 8, 16, 24 — the same access pattern as xG. The assumption for addr_xG correctly uses + 24, but this one says + 31. The bound should be + 24 for consistency and to match what is actually constrained.

Suggested change
desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$"
desc = "$(#`addr_k` mod 2^32) + 24 < 2^32$"

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This + 31 is on purpose, to make sure the EC_SCALAR chip doesn't trip up.
This comment can be disregarded.

@claude

claude Bot commented Jun 9, 2026

Copy link
Copy Markdown

Code Review — ECSM Accelerator Spec

Good overall structure. The three-chip decomposition (ECSM → ECDAS → EC_SCALAR) is clean, and the soundness argument for why the point-at-infinity can never appear during double-and-add is well-reasoned.

Issues found

Mathematical error (spec/ecsm.typ:23)
The non-singularity condition is written as $a^3+27b^2 \neq 0$, but the correct Weierstrass discriminant is $4a^3+27b^2 \neq 0$. Harmless for secp256k1 (a=0) but incorrect in general. See inline comment.

Typo (spec/ecsm.typ:158)
$q_0 \in [0, 2p)$ should be $q_1 \in [0, 2p)$ — copy-paste error in the quotient bound discussion.

Ref typo (spec/src/ecdas.toml:316)
"ecda:c:range_c0" is missing the trailing s — should be "ecdas:c:range_c0".

Assumption inconsistency (spec/src/ecsm.toml:199)
The alignment assumption for addr_k uses + 31 but the actual memory reads are in four 8-byte chunks at offsets 0, 8, 16, 24 — identical to addr_xG and addr_xR, which both use + 24. The bound should be + 24 for consistency.

Minor observations

  • The doubling formula is silently specialized to a=0; the general theory section (which still writes E(a,b,p)) could note this specialization explicitly.
  • Several new files are missing a trailing newline (about_ecalls.typ, ec_scalar.toml, ecdas.toml).
  • The aside explaining why y_A != 0 for secp256k1 says "as previously established" but the establishment is not in this document — citing the odd-order argument explicitly would make it self-contained.

Comment thread spec/ecsm.typ Outdated
Comment thread spec/ecsm.typ
@erik-3milabs

Copy link
Copy Markdown
Collaborator Author

Codex Code Review

Findings

High: inactive EC_SCALAR rows emit SERVE_K tokens. In spec/src/ec_scalar.toml, the recursive SERVE_K interaction uses multiplicity = ["not", "last_limb"], independent of μ. Since line 99 forces last_limb = 0 when μ = 0, every inactive/padding row emits a free SERVE_K token. Those can be consumed by active scalar rows and produce BIT tokens not rooted in the ECSM request’s addr_k, breaking the binding between k and the double-and-add sequence. Gate this by activity, e.g. μ * (1 - last_limb) or μ - last_limb with the existing implication.

Indeed. Fixed it.

High: ECDAS lambda carry recurrence uses c0[0] instead of c0[i]. In spec/src/ecdas.toml, the i = 1..63 lambda constraint starts with 2^8 * c0[0]. The rendered constraint and carry pattern require 2^8 * c0[i]. As written, most lambda carry limbs are not tied into the recurrence correctly, so the modular relation for λ is underconstrained for higher limbs. This can let invalid EC add/double steps satisfy the spec.

Fixed it

@erik-3milabs

Copy link
Copy Markdown
Collaborator Author

Code Review — ECSM Accelerator Spec

Minor observations

  • The doubling formula is silently specialized to a=0; the general theory section (which still writes E(a,b,p)) could note this specialization explicitly.

Technically, this is already addressed: the introduction mentions that it accelerates points in E(0, b, p)

  • Several new files are missing a trailing newline (about_ecalls.typ, ec_scalar.toml, ecdas.toml).

Fixed

  • The aside explaining why y_A != 0 for secp256k1 says "as previously established" but the establishment is not in this document — citing the odd-order argument explicitly would make it self-contained.

It is in the same document, but a section earlier. Disregarding the comment.

@RobinJadoul RobinJadoul left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I do wonder how much potential optimization we're leaving on the table by treating p as a generic prime, rather than using the structure it has
  • A more descriptive name for the BIT and SERVE_K interactions may be in order
  • The complex arith constraints for equalities mod p tend to have a different order between their "constraint" and "poly", which makes checking correspondence a bit tedious and error-prone
  • I have some thoughts about the presentation of the "theory"/background, but trying to focus on the more technical things first

Comment thread spec/about_ecalls.typ Outdated
Comment thread spec/src/ecsm.toml
Comment on lines +209 to +218
desc = "$(#`addr_xG` mod 2^32) + 24 < 2^32$"
ref = "ec:a:addr_xG_alignment"

[[assumptions]]
desc = "$(#`addr_k` mod 2^32) + 31 < 2^32$"
ref = "ec:a:addr_k_alignment"

[[assumptions]]
desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$"
ref = "ec:a:addr_xR_alignment"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is this? The fact that this comes from an ECALL means that it is not an assumption we can ensure at the spec level, and rather something that is pushed down into the guest program.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How safe is this?

I seem to recall that one of our strategies is to range check (address, value, timestamp) for all writes. As a result, all reads can now be assumed safe.
When we adhere to this rule, I believe the tagged assumption to be safe.

Reasoning:
Suppose addr_xG (or addr_k or addr_R for that matter) violates the tagged assumption.
That would mean that at least one of the interactions in ec:c:read_xG would read a value from an invalid address.
To balance the LogUp, one would have had to (previously) perform a write-only interaction for this same, illegal address.
Under the assumption that all writes range check their address, this should be impossible.

Note: performing a read&write-interaction would not cut it, as this would just move the problem.

that is pushed down into the guest program

You're right that this is indeed pushed down to the guest program.

Note that this is not the first time this is done; the SHA256-C3 and SHA256-C7 essentially leverage the same trick: if the addresses for h_addr and m_addr don't align, C2 and C6 will fail.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SHA256 version actually works regardless of the alignment?
SHA256-C3 and SHA256-C7 both use a full ADD so there's no requirement on being small enough, since ADD can deal with overlow correctly. And if it's not dword-aligned, then you get the fallback to slow version of MEMW, but besides performance, there's no safety or correctness assumption on the guest that I see there.
Please do correct me if I'm reading that wrong.

I think it is just a completeness thing and indeed not soundness as you argue, as long as we keep the initialization of addresses correctly range-checked, which is already a hard requirement due to MEMW.
We can make this assumption, but then I would put it more visibly, as this becomes critical knowledge for consumers, and not just for implementing the spec.

Comment thread spec/ecsm.typ Outdated
Comment thread spec/ecsm.typ Outdated
Comment thread spec/src/ec_scalar.toml
Comment on lines +9 to +13
[[variables.input]]
name = "ptr"
type = "DWordWL"
desc = "pointer to the first byte of the scalar"
pad = 0

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing a bunch of MEMWs in this chip? The calling ECSM has already read all of K into byte limbs, so we can just take in the limb as input and decompose it, right?

I guess the reason is that EC_SCALAR recurses onto itself, reducing the interactions in the ECSM table, but I think for total area it'd be better feed in the bytes directly; at which point, there may be some tradeoff between how many bits EC_SCALAR does at once for width vs depth, but again probably total area tells us to just make it 256 wide?
In the case 256 columns wide wouldn't be the right call, just making multiple calls into EC_SCALAR from ECSM should nevertheless still be the same area cost as the recursion, but simpler because there's no recursion logic to be performed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may even be some approach where we avoid the recursion of ECDAS as well, and simply let the EC_SCALAR chip emit the sequence of double/add steps that need to be performed. Not entirely sure if that wouldn't introduce too many more interactions atm, but it's potentially cleaner than having to reason about the recursion stopping.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing a bunch of MEMWs in this chip? The calling ECSM has already read all of K into byte limbs, so we can just take in the limb as input and decompose it, right?
I guess the reason is that EC_SCALAR recurses onto itself, reducing the interactions in the ECSM table, but I think for total area it'd be better feed in the bytes directly;

I ran the numbers, and you're right: switching would save 32 interactions and 160 cells per ECSM-call.

at which point, there may be some tradeoff between how many bits EC_SCALAR does at once for width vs depth, but again probably total area tells us to just make it 256 wide?
In the case 256 columns wide wouldn't be the right call, just making multiple calls into EC_SCALAR from ECSM should nevertheless still be the same area cost as the recursion, but simpler because there's no recursion logic to be performed.

You're right: Making EC-SCALAR work on all 256 bits at once would reduce the cell-footprint by another ~15% (from 320 to 257).

In fact, we can go even one step further, and merge EC-SCALAR into ECSM. This would save 2 interactions (the transfer of k from ECSM to EC-SCALAR, on both chips) and 33 cells (= multiplicity on EC-SCALAR + the byte representation of k on ECSM can be virtualized from the individual bits).

The thing I keep returning to, is whether padding isn't screwing with minimizing total area as a heuristic. I'll spend some time this afternoon to investigate this for a bit.

Comment thread spec/ecsm.typ

We now explore this carry-technique and provide some proofs.

== Lemma 1

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels a bit weird to have manual lemma/corollary numbering in the a title. Probably best to switch to some typst package for those, if we're intending to formalize/prove more anyway.

Comment thread spec/src/ecdas.toml Outdated
[[constraints.yR]]
kind = "interaction"
tag = "IS_HALF"
input = [["+", ["idx", "c2", "i"], 16320]]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe move these magic constant columns into a constant column of the chip? It's nicer to have them all in the same place to check correspondence with the analysis and make sure everything is up-to-date if we do change the analysis at some point (e.g. because we get Half limbs)

Comment thread spec/ecsm.typ Outdated
Introducing non-negative witnesses $q''_0$ and $q''_1$, we convert these into
$
2lambda y_A - 3x_A^2 + (#`r` - q''_0) p &= 0,\
lambda^2 - 2x_A - x_G - x_R + (#`r` - q''_1) p &= 0.\

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having x_G in there looks wrong and propagates a bit further

Comment thread spec/src/ecdas.toml Outdated
kind = "interaction"
tag = "BIT"
input = ["timestamp", "round"]
multiplicity = ["-", "next_op"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the positive here and the negative in EC_SCALAR, to keep with the usual interpretation of μ = -1 meaning that you've "shown" the relation to hold

Comment thread spec/src/ecdas.toml
Comment on lines +479 to +483
[[constraints.send]]
kind = "template"
tag = "IS_BIT"
input = ["next_op"]
ref = "ecdas:c:range_next_op"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LogUp should take care of it, but probably good to still bit-check op itself too

@erik-3milabs erik-3milabs requested a review from RobinJadoul June 16, 2026 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec Updates and improvements to the spec document

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants