Optimize VBA compression architecture and preserve benchmark gains

## Context

The compression library currently has two useful reference points:

- Old algorithm: `benchmark` branch
- New algorithm: `feature/architecture_change` branch

The benchmark project now exercises a corpus of real VBA source files from the ExcelVBA sample repository, plus repeated and low-compressibility variants. This gives us a more realistic signal than only synthetic byte arrays.

## Current architecture analysis

### Old algorithm

The old compression path spends most of its time in copy-token discovery. For each position in the decompressed chunk, it searches backwards through previously seen bytes to find the best matching sequence. This makes compression cost grow quickly as the chunk gets larger or as the input contains many partial matches.

The old approach also creates more short-lived objects and temporary buffers during tokenization and decompression. This increases managed allocation pressure and causes more GC work during larger inputs.

### New algorithm

The new branch changes the hot path in two important ways:

1. It introduces indexed match lookup through a `MatchFinder` that keys candidate positions by a 3-byte prefix.
2. It adds a fast path for repeated-byte runs, where the best copy token can be found without walking the general candidate chain.

For decompression, the new branch writes output directly into a byte list. Copy tokens append from already decompressed output instead of seeking through a stream and allocating temporary copied sequences.

The result is a much better balance: compression is substantially faster, and managed allocations are lower across every benchmarked input.

## Benchmark comparison

Positive deltas mean the new algorithm is better.

### Compression

| Input | Old algorithm mean | New algorithm mean | Speed delta | Allocation delta |
|---|---:|---:|---:|---:|
| `ExcelVbaCorpus` | 25.970 ms | 2.009 ms | +92.3% | +63.4% less |
| `ExcelVbaCorpusRepeated` | 163.750 ms | 15.978 ms | +90.2% | +59.1% less |
| `ExcelVbaCorpusWithNoise` | 448.400 ms | 26.664 ms | +94.1% | +66.2% less |
| `ExcelVbaLargestModuleRepeated` | 151.270 ms | 12.019 ms | +92.1% | +61.8% less |
| `LowCompressibility` | 534.190 ms | 16.495 ms | +96.9% | +19.1% less |

### Decompression

| Input | Old algorithm mean | New algorithm mean | Speed delta | Allocation delta |
|---|---:|---:|---:|---:|
| `ExcelVbaCorpus` | 894.200 us | 749.380 us | +16.2% | +15.9% less |
| `ExcelVbaCorpusRepeated` | 7.737 ms | 6.891 ms | +10.9% | +13.7% less |
| `ExcelVbaCorpusWithNoise` | 25.995 ms | 20.099 ms | +22.7% | +6.7% less |
| `ExcelVbaLargestModuleRepeated` | 7.853 ms | 6.264 ms | +20.2% | +13.2% less |
| `LowCompressibility` | 113.700 us | 57.320 us | +49.6% | +24.7% less |

## Proposed changes

- Keep the indexed match-finder architecture for compression.
- Keep the repeated-byte run fast path because it gives a large win for repetitive data and low-compressibility test cases.
- Keep decompression writing into a contiguous byte buffer rather than using stream seeking and temporary copy arrays.
- Add or preserve regression tests for malformed compressed data, copy-token bounds, and decompression equivalence.
- Keep the benchmark project as part of the repository so future compression changes can be compared against the real VBA corpus.
- Consider adding a longer benchmark job for release/performance validation, while keeping `ShortRun` useful for quick local checks.

## Notes

The benchmark values above came from BenchmarkDotNet `ShortRun`, so exact percentages may vary between machines or runs. The compression improvement is large enough that the direction is clear: the new architecture should be preferred, provided compatibility tests continue to pass.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize VBA compression architecture and preserve benchmark gains #20

Context

Current architecture analysis

Old algorithm

New algorithm

Benchmark comparison

Compression

Decompression

Proposed changes

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Input	Old algorithm mean	New algorithm mean	Speed delta	Allocation delta
`ExcelVbaCorpus`	25.970 ms	2.009 ms	+92.3%	+63.4% less
`ExcelVbaCorpusRepeated`	163.750 ms	15.978 ms	+90.2%	+59.1% less
`ExcelVbaCorpusWithNoise`	448.400 ms	26.664 ms	+94.1%	+66.2% less
`ExcelVbaLargestModuleRepeated`	151.270 ms	12.019 ms	+92.1%	+61.8% less
`LowCompressibility`	534.190 ms	16.495 ms	+96.9%	+19.1% less

Input	Old algorithm mean	New algorithm mean	Speed delta	Allocation delta
`ExcelVbaCorpus`	894.200 us	749.380 us	+16.2%	+15.9% less
`ExcelVbaCorpusRepeated`	7.737 ms	6.891 ms	+10.9%	+13.7% less
`ExcelVbaCorpusWithNoise`	25.995 ms	20.099 ms	+22.7%	+6.7% less
`ExcelVbaLargestModuleRepeated`	7.853 ms	6.264 ms	+20.2%	+13.2% less
`LowCompressibility`	113.700 us	57.320 us	+49.6%	+24.7% less

Uh oh!

Optimize VBA compression architecture and preserve benchmark gains #20

Description

Context

Current architecture analysis

Old algorithm

New algorithm

Benchmark comparison

Compression

Decompression

Proposed changes

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions