Skip to content

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135

Merged
MikaelMayer merged 5 commits into
mainfrom
tautschnig/anf-fixedpoint
May 19, 2026
Merged

ANFEncoder: iterate to fixpoint to eliminate nested duplicates#1135
MikaelMayer merged 5 commits into
mainfrom
tautschnig/anf-fixedpoint

Conversation

@tautschnig
Copy link
Copy Markdown
Contributor

The ANF encoder extracts duplicated subexpressions into fresh variables, but the existing single-pass implementation can leave large duplicated sub-subexpressions behind.

Root cause: removeSubsumed drops candidate duplicates that are subexpressions of other (larger) candidate duplicates, to avoid creating redundant variable declarations. But this means that if only the outer expression appears at the top level, the inner dupes are hidden inside the lifted var declaration and never extracted.

Example (from PyAnalyzeLaurel benchmark check_storage_costs):

Original: assert Any_to_bool(Any_get(response, "Datapoints")) ...

After partial evaluation, Any_to_bool inlines its 7-branch body,
each branch referencing the argument. With Any_get also inlined
as an ite (is-DictStrAny response) (DictStrAny_get ...) (List_get ...), the Any_get expression ends up duplicated 62
times inside a single assert.

Old ANF output:
var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>;
assert $__anf.0

New ANF output (after iteration):
var $__anf.3 : Any := Any_get(response, "Datapoints");
var $__anf.0 : bool := <body using $__anf.3>;
assert $__anf.0

Effect: VC file size for VCs on one benchmark drops from 32KB to 11KB (~65% reduction), and another benchmark now completes verification at 36s where it previously hit a 60s timeout.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

The ANF encoder extracts duplicated subexpressions into fresh variables,
but the existing single-pass implementation can leave large duplicated
sub-subexpressions behind.

Root cause: `removeSubsumed` drops candidate duplicates that are
subexpressions of other (larger) candidate duplicates, to avoid
creating redundant variable declarations. But this means that if
only the outer expression appears at the top level, the inner dupes
are hidden inside the lifted var declaration and never extracted.

Example (from PyAnalyzeLaurel benchmark check_storage_costs):

  Original: assert Any_to_bool(Any_get(response, "Datapoints")) ...

  After partial evaluation, Any_to_bool inlines its 7-branch body,
  each branch referencing the argument. With Any_get also inlined
  as an `ite (is-DictStrAny response) (DictStrAny_get ...)
  (List_get ...)`, the Any_get expression ends up duplicated 62
  times inside a single assert.

  Old ANF output:
    var $__anf.0 : bool := <9KB body with 62 duplicates of Any_get>;
    assert $__anf.0

  New ANF output (after iteration):
    var $__anf.3 : Any := Any_get(response, "Datapoints");
    var $__anf.0 : bool := <body using $__anf.3>;
    assert $__anf.0

Effect: VC file size for VCs on one benchmark drops from 32KB to
11KB (~65% reduction), and another benchmark now completes
verification at 36s where it previously hit a 60s timeout.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the Core ANF encoder so it repeatedly extracts duplicated subexpressions until reaching a fixpoint, reducing duplication that can remain hidden inside newly introduced var declarations after a single extraction pass.

Changes:

  • Update anfEncodeBody to iterate until no further ANF encoder targets are found.
  • Adjust documentation to explain why fixpoint iteration is needed for nested-duplicate elimination.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread Strata/Transform/ANFEncoder.lean Outdated
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Clean, well-motivated change. The fixpoint iteration is a natural solution to the nested-duplicate problem, and the early-exit on targets.isEmpty guarantees termination in practice (each pass extracts at least one duplicate, strictly reducing the pool of non-leaf subexpressions).

One suggestion below about test coverage for the new iteration behavior.

Comment thread Strata/Transform/ANFEncoder.lean Outdated
Comment thread Strata/Transform/ANFEncoder.lean Outdated
tautschnig and others added 2 commits May 16, 2026 00:06
Three changes responding to the review threads:

1. Make the replacements map collision-safe (Copilot). The map's value type
   is now List (Expr × Expr) keyed by UInt64. Insertions append to the bucket
   list and lookups walk the list with structural `==`, so two distinct
   duplicates that share a hash do not displace each other — at most a few
   extra equality checks per lookup, and the bottom-up O(n) hash computation
   in replaceExprs is preserved.

2. Convert anfEncodeBody from `partial def` to a real `def` with a
   structurally decreasing fuel parameter (joscoh). The fuel value is the
   total expression size of the initial body, which is a sound upper bound
   on the iteration count. The docstring now states the termination
   argument explicitly: each pass replaces every duplicate occurrence by a
   fresh fvar (a leaf, filtered from future S(body)) and adds at most one
   var-decl init per duplicate (already in S(body)), so S strictly shrinks
   in non-trivial passes and is finite.

3. Add a multi-pass regression test (MikaelMayer). `nestedDupProg` has
   `(x + 1) * 2` (duplicate) and `x + 1` (subsumed by the larger duplicate
   in pass 1). Pass 1 lifts `(x + 1) * 2`; pass 2 then lifts `x + 1` after
   it becomes visible in the new var-decl init and the third assert. The
   single-pass version of the encoder would leave the inner duplicate in
   place.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖🔍 LGTM — all comments addressed.

joscoh
joscoh previously approved these changes May 16, 2026
Comment thread Strata/Transform/ANFEncoder.lean Outdated
tautschnig and others added 2 commits May 19, 2026 08:15
Drop the "(cf. PR #1135 review)" parenthetical so the docstring is
authoritative on its own (per MikaelMayer). Replace it with a sentence
that makes explicit what the parenthetical was hand-waving toward: the
collision-safe lookup is what lets anfEncodeBody's termination argument
claim that every duplicate found by findANFEncoderTargets is actually
rewritten on the same pass, so no unreplaced duplicate can survive into
the next iteration.

Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@MikaelMayer MikaelMayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖🔍 All previous comments appear addressed — reviewer sign-off still needed.

@MikaelMayer MikaelMayer added this pull request to the merge queue May 19, 2026
Merged via the queue into main with commit 18878bd May 19, 2026
23 of 24 checks passed
@MikaelMayer MikaelMayer deleted the tautschnig/anf-fixedpoint branch May 19, 2026 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants