Context
The compression library currently has two useful reference points:
- Old algorithm:
benchmark branch
- New algorithm:
feature/architecture_change branch
The benchmark project now exercises a corpus of real VBA source files from the ExcelVBA sample repository, plus repeated and low-compressibility variants. This gives us a more realistic signal than only synthetic byte arrays.
Current architecture analysis
Old algorithm
The old compression path spends most of its time in copy-token discovery. For each position in the decompressed chunk, it searches backwards through previously seen bytes to find the best matching sequence. This makes compression cost grow quickly as the chunk gets larger or as the input contains many partial matches.
The old approach also creates more short-lived objects and temporary buffers during tokenization and decompression. This increases managed allocation pressure and causes more GC work during larger inputs.
New algorithm
The new branch changes the hot path in two important ways:
- It introduces indexed match lookup through a
MatchFinder that keys candidate positions by a 3-byte prefix.
- It adds a fast path for repeated-byte runs, where the best copy token can be found without walking the general candidate chain.
For decompression, the new branch writes output directly into a byte list. Copy tokens append from already decompressed output instead of seeking through a stream and allocating temporary copied sequences.
The result is a much better balance: compression is substantially faster, and managed allocations are lower across every benchmarked input.
Benchmark comparison
Positive deltas mean the new algorithm is better.
Compression
| Input |
Old algorithm mean |
New algorithm mean |
Speed delta |
Allocation delta |
ExcelVbaCorpus |
25.970 ms |
2.009 ms |
+92.3% |
+63.4% less |
ExcelVbaCorpusRepeated |
163.750 ms |
15.978 ms |
+90.2% |
+59.1% less |
ExcelVbaCorpusWithNoise |
448.400 ms |
26.664 ms |
+94.1% |
+66.2% less |
ExcelVbaLargestModuleRepeated |
151.270 ms |
12.019 ms |
+92.1% |
+61.8% less |
LowCompressibility |
534.190 ms |
16.495 ms |
+96.9% |
+19.1% less |
Decompression
| Input |
Old algorithm mean |
New algorithm mean |
Speed delta |
Allocation delta |
ExcelVbaCorpus |
894.200 us |
749.380 us |
+16.2% |
+15.9% less |
ExcelVbaCorpusRepeated |
7.737 ms |
6.891 ms |
+10.9% |
+13.7% less |
ExcelVbaCorpusWithNoise |
25.995 ms |
20.099 ms |
+22.7% |
+6.7% less |
ExcelVbaLargestModuleRepeated |
7.853 ms |
6.264 ms |
+20.2% |
+13.2% less |
LowCompressibility |
113.700 us |
57.320 us |
+49.6% |
+24.7% less |
Proposed changes
- Keep the indexed match-finder architecture for compression.
- Keep the repeated-byte run fast path because it gives a large win for repetitive data and low-compressibility test cases.
- Keep decompression writing into a contiguous byte buffer rather than using stream seeking and temporary copy arrays.
- Add or preserve regression tests for malformed compressed data, copy-token bounds, and decompression equivalence.
- Keep the benchmark project as part of the repository so future compression changes can be compared against the real VBA corpus.
- Consider adding a longer benchmark job for release/performance validation, while keeping
ShortRun useful for quick local checks.
Notes
The benchmark values above came from BenchmarkDotNet ShortRun, so exact percentages may vary between machines or runs. The compression improvement is large enough that the direction is clear: the new architecture should be preferred, provided compatibility tests continue to pass.
Context
The compression library currently has two useful reference points:
benchmarkbranchfeature/architecture_changebranchThe benchmark project now exercises a corpus of real VBA source files from the ExcelVBA sample repository, plus repeated and low-compressibility variants. This gives us a more realistic signal than only synthetic byte arrays.
Current architecture analysis
Old algorithm
The old compression path spends most of its time in copy-token discovery. For each position in the decompressed chunk, it searches backwards through previously seen bytes to find the best matching sequence. This makes compression cost grow quickly as the chunk gets larger or as the input contains many partial matches.
The old approach also creates more short-lived objects and temporary buffers during tokenization and decompression. This increases managed allocation pressure and causes more GC work during larger inputs.
New algorithm
The new branch changes the hot path in two important ways:
MatchFinderthat keys candidate positions by a 3-byte prefix.For decompression, the new branch writes output directly into a byte list. Copy tokens append from already decompressed output instead of seeking through a stream and allocating temporary copied sequences.
The result is a much better balance: compression is substantially faster, and managed allocations are lower across every benchmarked input.
Benchmark comparison
Positive deltas mean the new algorithm is better.
Compression
ExcelVbaCorpusExcelVbaCorpusRepeatedExcelVbaCorpusWithNoiseExcelVbaLargestModuleRepeatedLowCompressibilityDecompression
ExcelVbaCorpusExcelVbaCorpusRepeatedExcelVbaCorpusWithNoiseExcelVbaLargestModuleRepeatedLowCompressibilityProposed changes
ShortRunuseful for quick local checks.Notes
The benchmark values above came from BenchmarkDotNet
ShortRun, so exact percentages may vary between machines or runs. The compression improvement is large enough that the direction is clear: the new architecture should be preferred, provided compatibility tests continue to pass.