spec: ECSM#655
Conversation
Codex Code ReviewFindings
No other concrete PR-diff issues found in this pass. |
| ref = "ec:a:addr_k_alignment" | ||
|
|
||
| [[assumptions]] | ||
| desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$" |
There was a problem hiding this comment.
Low — Inconsistent assumption bound for addr_k
k is 32 bytes, read in four 8-byte chunks at offsets 0, 8, 16, 24 — the same access pattern as xG. The assumption for addr_xG correctly uses + 24, but this one says + 31. The bound should be + 24 for consistency and to match what is actually constrained.
| desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$" | |
| desc = "$(#`addr_k` mod 2^32) + 24 < 2^32$" |
There was a problem hiding this comment.
This + 31 is on purpose, to make sure the EC_SCALAR chip doesn't trip up.
This comment can be disregarded.
Code Review — ECSM Accelerator SpecGood overall structure. The three-chip decomposition (ECSM → ECDAS → EC_SCALAR) is clean, and the soundness argument for why the point-at-infinity can never appear during double-and-add is well-reasoned. Issues foundMathematical error (spec/ecsm.typ:23) Typo (spec/ecsm.typ:158) Ref typo (spec/src/ecdas.toml:316) Assumption inconsistency (spec/src/ecsm.toml:199) Minor observations
|
Indeed. Fixed it.
Fixed it |
Technically, this is already addressed: the introduction mentions that it accelerates points in
Fixed
It is in the same document, but a section earlier. Disregarding the comment. |
RobinJadoul
left a comment
There was a problem hiding this comment.
- I do wonder how much potential optimization we're leaving on the table by treating
pas a generic prime, rather than using the structure it has - A more descriptive name for the
BITandSERVE_Kinteractions may be in order - The complex arith constraints for equalities mod p tend to have a different order between their "constraint" and "poly", which makes checking correspondence a bit tedious and error-prone
- I have some thoughts about the presentation of the "theory"/background, but trying to focus on the more technical things first
| desc = "$(#`addr_xG` mod 2^32) + 24 < 2^32$" | ||
| ref = "ec:a:addr_xG_alignment" | ||
|
|
||
| [[assumptions]] | ||
| desc = "$(#`addr_k` mod 2^32) + 31 < 2^32$" | ||
| ref = "ec:a:addr_k_alignment" | ||
|
|
||
| [[assumptions]] | ||
| desc = "$(#`addr_xR` mod 2^32) + 24 < 2^32$" | ||
| ref = "ec:a:addr_xR_alignment" |
There was a problem hiding this comment.
How safe is this? The fact that this comes from an ECALL means that it is not an assumption we can ensure at the spec level, and rather something that is pushed down into the guest program.
There was a problem hiding this comment.
How safe is this?
I seem to recall that one of our strategies is to range check (address, value, timestamp) for all writes. As a result, all reads can now be assumed safe.
When we adhere to this rule, I believe the tagged assumption to be safe.
Reasoning:
Suppose addr_xG (or addr_k or addr_R for that matter) violates the tagged assumption.
That would mean that at least one of the interactions in ec:c:read_xG would read a value from an invalid address.
To balance the LogUp, one would have had to (previously) perform a write-only interaction for this same, illegal address.
Under the assumption that all writes range check their address, this should be impossible.
Note: performing a read&write-interaction would not cut it, as this would just move the problem.
that is pushed down into the guest program
You're right that this is indeed pushed down to the guest program.
Note that this is not the first time this is done; the SHA256-C3 and SHA256-C7 essentially leverage the same trick: if the addresses for h_addr and m_addr don't align, C2 and C6 will fail.
There was a problem hiding this comment.
I think the SHA256 version actually works regardless of the alignment?
SHA256-C3 and SHA256-C7 both use a full ADD so there's no requirement on being small enough, since ADD can deal with overlow correctly. And if it's not dword-aligned, then you get the fallback to slow version of MEMW, but besides performance, there's no safety or correctness assumption on the guest that I see there.
Please do correct me if I'm reading that wrong.
I think it is just a completeness thing and indeed not soundness as you argue, as long as we keep the initialization of addresses correctly range-checked, which is already a hard requirement due to MEMW.
We can make this assumption, but then I would put it more visibly, as this becomes critical knowledge for consumers, and not just for implementing the spec.
| [[variables.input]] | ||
| name = "ptr" | ||
| type = "DWordWL" | ||
| desc = "pointer to the first byte of the scalar" | ||
| pad = 0 |
There was a problem hiding this comment.
Why are we doing a bunch of MEMWs in this chip? The calling ECSM has already read all of K into byte limbs, so we can just take in the limb as input and decompose it, right?
I guess the reason is that EC_SCALAR recurses onto itself, reducing the interactions in the ECSM table, but I think for total area it'd be better feed in the bytes directly; at which point, there may be some tradeoff between how many bits EC_SCALAR does at once for width vs depth, but again probably total area tells us to just make it 256 wide?
In the case 256 columns wide wouldn't be the right call, just making multiple calls into EC_SCALAR from ECSM should nevertheless still be the same area cost as the recursion, but simpler because there's no recursion logic to be performed.
There was a problem hiding this comment.
There may even be some approach where we avoid the recursion of ECDAS as well, and simply let the EC_SCALAR chip emit the sequence of double/add steps that need to be performed. Not entirely sure if that wouldn't introduce too many more interactions atm, but it's potentially cleaner than having to reason about the recursion stopping.
There was a problem hiding this comment.
Why are we doing a bunch of MEMWs in this chip? The calling ECSM has already read all of K into byte limbs, so we can just take in the limb as input and decompose it, right?
I guess the reason is that EC_SCALAR recurses onto itself, reducing the interactions in the ECSM table, but I think for total area it'd be better feed in the bytes directly;
I ran the numbers, and you're right: switching would save 32 interactions and 160 cells per ECSM-call.
at which point, there may be some tradeoff between how many bits EC_SCALAR does at once for width vs depth, but again probably total area tells us to just make it 256 wide?
In the case 256 columns wide wouldn't be the right call, just making multiple calls into EC_SCALAR from ECSM should nevertheless still be the same area cost as the recursion, but simpler because there's no recursion logic to be performed.
You're right: Making EC-SCALAR work on all 256 bits at once would reduce the cell-footprint by another ~15% (from 320 to 257).
In fact, we can go even one step further, and merge EC-SCALAR into ECSM. This would save 2 interactions (the transfer of k from ECSM to EC-SCALAR, on both chips) and 33 cells (= multiplicity on EC-SCALAR + the byte representation of k on ECSM can be virtualized from the individual bits).
The thing I keep returning to, is whether padding isn't screwing with minimizing total area as a heuristic. I'll spend some time this afternoon to investigate this for a bit.
|
|
||
| We now explore this carry-technique and provide some proofs. | ||
|
|
||
| == Lemma 1 |
There was a problem hiding this comment.
Feels a bit weird to have manual lemma/corollary numbering in the a title. Probably best to switch to some typst package for those, if we're intending to formalize/prove more anyway.
| [[constraints.yR]] | ||
| kind = "interaction" | ||
| tag = "IS_HALF" | ||
| input = [["+", ["idx", "c2", "i"], 16320]] |
There was a problem hiding this comment.
Can we maybe move these magic constant columns into a constant column of the chip? It's nicer to have them all in the same place to check correspondence with the analysis and make sure everything is up-to-date if we do change the analysis at some point (e.g. because we get Half limbs)
| Introducing non-negative witnesses $q''_0$ and $q''_1$, we convert these into | ||
| $ | ||
| 2lambda y_A - 3x_A^2 + (#`r` - q''_0) p &= 0,\ | ||
| lambda^2 - 2x_A - x_G - x_R + (#`r` - q''_1) p &= 0.\ |
There was a problem hiding this comment.
Having x_G in there looks wrong and propagates a bit further
| kind = "interaction" | ||
| tag = "BIT" | ||
| input = ["timestamp", "round"] | ||
| multiplicity = ["-", "next_op"] |
There was a problem hiding this comment.
I'd keep the positive here and the negative in EC_SCALAR, to keep with the usual interpretation of μ = -1 meaning that you've "shown" the relation to hold
| [[constraints.send]] | ||
| kind = "template" | ||
| tag = "IS_BIT" | ||
| input = ["next_op"] | ||
| ref = "ecdas:c:range_next_op" |
There was a problem hiding this comment.
The LogUp should take care of it, but probably good to still bit-check op itself too
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
TODO: explain the magic
This PR introduces the
ECSM(Elliptic Curve Scalar Multiplication) accelerator.