Skip to content

Extract indicies should be bits instead of bytes #520

@filipeom

Description

@filipeom

Problem

The current implementation of extract for bitvectors in Smtml uses byte offsets instead of bit offsets. This is non-standard and contradicts SMT-LIB's (_ extract high low) operator, where high and low are inclusive bit indices.

Currently:

  • Expr.Extract(e, high, low) stores byte offsets.
  • Expr.ty calculates bitwidth as (high - low) * 8.
  • Expr.extract converts bytes to bits internally: high_bit = (high * 8) - 1, low_bit = low * 8.
  • Solver mappings in mappings.ml perform the same conversion.
  • Smtml.Typed.Bitv modules documentation and implementation use byte-based indices.

This leads to confusion when parsing standard SMT2 files and makes the API inconsistent with other SMT solvers.

Proposed Changes

  1. Core AST (src/smtml/expr.ml)

    • Redefine Extract node to store bit indices directly.
    • Update Expr.ty to return Ty_bitv (high - low + 1).
    • Update Expr.extract and Expr.raw_extract to accept bit indices. Remove internal byte-to-bit conversion.
  2. Typed API (src/smtml/typed.ml, src/smtml/typed.mli)

    • Update Bitv.extract signature and documentation to use bit indices.
    • Update all internal usages of extract:
      • Bitv32.to_bytes, Bitv64.to_bytes
      • Bitv128.to_i32x4, Bitv128.to_i64x2
      • Bitv*.extend_s
    • Update example in typed.mli documentation.
  3. Parser (src/smtml/smtlib.ml)

    • Simplify the extract case in the SMT-LIB parser to pass the indices directly from the SMT2 file without dividing by 8.
  4. Mappings (src/smtml/mappings.ml)

    • Remove the (h * 8) - 1 and l * 8 conversions in mappings.ml. Pass the indices directly to the solver mapping functions.

Implementation Plan

  • Step 1: AST and Type System
    • Modify src/smtml/expr.ml and src/smtml/expr.mli.
  • Step 2: Typed API Refactoring
    • Modify src/smtml/typed.ml and src/smtml/typed.mli.
  • Step 3: Parser and Mappings
    • Modify src/smtml/smtlib.ml.
    • Modify src/smtml/mappings.ml.
  • Step 4: Cleanup and Verification
    • Audit all extract calls in the codebase.
    • Update any existing tests that rely on byte-based extraction.

Test Plan for Bit-level Extraction

1. Unit Tests (test/unit/test_extract.ml)

Create a new unit test suite to verify the correctness of the new bit-level extraction.

Test Cases:

  • Basic bit extraction:
    • extract (i32 0xF) ~high:0 ~low:0 should result in 1.
    • extract (i32 0xF) ~high:3 ~low:0 should result in 0xF.
  • Non-byte-aligned extraction:
    • extract (i32 0xAA) ~high:5 ~low:2 (binary 10101010, bits 5-2 are 1010) -> 0xA.
  • Width verification:
    • Expr.ty (extract e ~high:7 ~low:0) must be Ty_bitv 8.
    • Expr.ty (extract e ~high:15 ~low:0) must be Ty_bitv 16.
  • Boundary conditions:
    • extract e ~high:31 ~low:0 for 32-bit value.
    • extract e ~high:0 ~low:0.

2. Integration Tests

  • SMT-LIB Compatibility:
    • Add a new test case in test/integration/test_solver.ml that uses a hand-written SMT-LIB string with bit-level extraction: (assert (= ((_ extract 3 0) #xAF) #xF)).
    • Verify it passes with all enabled solvers (Z3, Bitwuzla, etc.).
  • Typed API consistency:
    • Verify Bitv32.to_bytes still produces the correct 4 bytes after being updated to use bit offsets.

3. Regression Tests

  • Run the full suite: dune runtest.
  • Pay special attention to test/cli/smt2/ regression tests. If any SMT2 files fail, they might have been relying on the previous (incorrect) behavior or the parser change might have fixed them.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions