Skip to content

Java/Kotlin lambda → functional-interface (SAM) extraction + synthesis #487

@colbymchenry

Description

@colbymchenry

Context

Surfaced during the playbook validation for #412 (JVM FQN imports + anonymous-class extraction). The anon-class extraction closed a significant guava read gap (CacheBuilder: 1.5 → 0 Read in agent A/B), but Splitter.on(...).split()-style flows still drive 2–3 Reads per agent run because lambdas passed to constructors are not in the graph at all.

The hole

Guava (and any Java 8+ functional-interface usage) does:

return new Splitter(
    (splitter, toSplit) ->
        new SplittingIterator(splitter, toSplit) {
          @Override int separatorStart(int s) { ... }
        });

The lambda (splitter, toSplit) -> ... IS Strategy.iterator for this Splitter instance — it's the body the runtime jumps to when strategy.iterator(this, seq) runs in splittingIterator. Static extraction currently treats lambda_expression as a no-op pass-through, so:

  • Strategy.iterator (the SAM interface method) has zero implementations in the graph.
  • trace(splittingIterator, separatorStart) fails at the strategy.iterator(...) hop — no callees.
  • An agent investigating that flow Reads Splitter.java to manually link on() → lambda → anon SplittingIterator.

#412's anon-class extraction picks up the new SplittingIterator() { ... } inside the lambda (so the override is in the graph), but the lambda itself is not — so the chain still breaks at Strategy.iterator.

Proposed mechanism

  1. Extract lambda_expression (Java) and arrow_function/function_expression argument lambdas as method nodes named <lambda@line>, scoped under the enclosing method via the existing nodeStack.
  2. At synthesis time, bind each lambda to a SAM type: walk up to the enclosing call/object_creation_expression, look up the called constructor or method's signature, take the parameter type at the lambda's argument position, and check if that type is a single-abstract-method interface (sole abstract method in its body). If so, synthesize a calls edge from the SAM method → the lambda's body, tagged provenance:'heuristic', synthesizedBy:'lambda-sam'.
  3. Method references (Class::method, obj::method) extend the same mechanism — bind the referenced method to the SAM target.

The existing IFACE_OVERRIDE_LANGS interface-impl synthesizer can be reused for the linking step once lambdas have an implements edge to the SAM interface.

Validation target (playbook)

A re-run of the guava agent A/B should drop Splitter q3 with-arm reads from 2.5 → ~0 and CacheBuilder q2 should stay at 0. Spring repos (petclinic-kt, mall) must remain at 0/0 — no regression. No node explosion (a sanity check on mall's 22,861 node count).

Risks / scope notes

Per docs/design/dynamic-dispatch-coverage-playbook.md: "partial coverage is WORSE than none." If the SAM binding misses the case where the lambda body itself constructs an anon class (the guava (s, t) -> new SplittingIterator() { ... } pattern), we'd surface a dead-end Strategy.iterator → <lambda> edge with no further bridge to separatorStart. The end-to-end test must verify a 4+ hop trace lands on the override body, not just the lambda.

Cost is a real extractor change (lambdas become nodes, signature lookup is new at synthesis time) — probably a 3–6 hour PR plus playbook validation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions