Skip to content

Benchmark comparison with Rust regex crate #10

@i11010520

Description

@i11010520

Here is a benchmark comparison between zig-regex and the Rust regex crate (generated via OpenCode / DeepSeek V4 Flash Free).
While Zig wins on I/O and process startup time, zig-regex lags significantly behind in search throughput—especially for findAll.

What architectural changes or optimizations could we introduce to improve zig-regex performance for both single-match and multi-match operations, particularly for findAll?


zig-regex vs Rust regex crate — Performance Comparison

Date: 2026-06-02
Platform: Apple M4, macOS
Zig: 0.16.0
Rust: 1.96.0
Rust regex crate: 1.12.3
zig-regex: zig-utils/zig-regex @ latest (2026-06-02)

Methodology

We compare the search throughput of both engines — how fast they can count all
matches of a pattern in a haystack — excluding process startup time.

Each program:

  1. Reads the full haystack from stdin into memory
  2. Compiles the regex pattern
  3. Runs 1 warmup iteration (ignored from timing)
  4. Records a timestamp
  5. Runs N iterations (scanning the full haystack each time)
  6. Records another timestamp
  7. Prints total_matches iterations elapsed_ns to stdout

The elapsed time divided by iterations gives the time per full-scan.

Haystack

A 1.17 MB synthetic text file generated from a pool of words:
hello, world, foo123, bar456, baz789, test, regex, zig, rust

Generated with:

import random
words = ['hello', 'world', 'foo123', 'bar456', 'baz789', 'test', 'regex', 'zig', 'rust']
text = ' '.join(random.choice(words) for _ in range(200000))

Programs

Rust (/tmp/rust-bench)

use std::io::Read;
use std::time::Instant;
use regex::Regex;

fn main() {
    let args: Vec<String> = std::env::args().collect();
    let pattern = &args[1];
    let iterations: usize = args.get(2).and_then(|s| s.parse().ok()).unwrap_or(1);

    let mut haystack = String::new();
    std::io::stdin().read_to_string(&mut haystack).unwrap();

    let re = Regex::new(pattern).unwrap();
    let _ = re.find_iter(&haystack).count(); // warmup

    let start = Instant::now();
    let mut total = 0usize;
    for _ in 0..iterations {
        total += re.find_iter(&haystack).count();
    }
    let elapsed = start.elapsed();

    eprintln!("{} {} {}.{:09}", total, iterations, elapsed.as_secs(), elapsed.subsec_nanos());
}

Uses regex::Regex::find_iter — a lazy, zero-allocation iterator over match
positions. No per-match heap allocations.

Zig (/tmp/zig-bench)

Uses the zig-utils/zig-regex library, built with -Doptimize=ReleaseFast.

const std = @import("std");
const Regex = @import("regex").Regex;

pub fn main(init: std.process.Init) !void {
    const allocator = init.gpa;
    const arena = init.arena.allocator();
    const io = init.io;

    const args = try init.minimal.args.toSlice(arena);
    const pattern = args[1];
    const iterations: usize = if (args.len > 2) try std.fmt.parseInt(usize, args[2], 10) else 1;

    var stdin_buf: [4096]u8 = undefined;
    var reader = std.Io.File.stdin().reader(io, &stdin_buf);
    var result: std.ArrayList(u8) = .empty;
    defer result.deinit(allocator);
    var chunk: [4096]u8 = undefined;
    while (true) {
        const n = try reader.interface.readSliceShort(&chunk);
        if (n == 0) break;
        try result.appendSlice(allocator, chunk[0..n]);
    }
    const haystack = try result.toOwnedSlice(allocator);
    defer allocator.free(haystack);

    var regex = try Regex.compile(allocator, pattern);
    defer regex.deinit();

    const warm_matches = try regex.findAll(allocator, haystack);
    defer {
        for (warm_matches) |*m| m.deinit(allocator);
        allocator.free(warm_matches);
    }

    const start = std.Io.Clock.awake.now(io);
    var total: usize = 0;
    var i: usize = 0;
    while (i < iterations) : (i += 1) {
        const matches = try regex.findAll(allocator, haystack);
        defer {
            for (matches) |*m| m.deinit(allocator);
            allocator.free(matches);
        }
        total += matches.len;
    }
    const end = std.Io.Clock.awake.now(io);
    const elapsed = start.durationTo(end);

    var buf: [64]u8 = undefined;
    var stdout_w = std.Io.File.stdout().writer(io, &buf);
    const stdout = &stdout_w.interface;
    try stdout.print("{} {} {}\n", .{ total, iterations, elapsed.nanoseconds });
    try stdout.flush();
}

Uses Regex.findAll — allocates a Match struct (with per-group capture slices)
for every match on every call. Each call also re-initializes the internal VM.

Engine-only results (1.17 MB haystack)

Timed programs that exclude stdin reading and process startup (see source above).

Operation Matches Zig Rust Ratio
Compile hello 6.8 µs 2.8 µs 2.4x
Single find hello 2.0 µs ~0.5 µs ~4x
findAll hello 21,899 60 ms 0.51 ms 118x
findAll hello|world|test 66,576 173 ms 2.87 ms 60x
findAll \d+ 66,408 83 ms 3.78 ms 22x
findAll \w+ 200,000 83 ms 6.35 ms 13x

CLI comparison (hyperfine, full process)

Comparing the zig-regex CLI vs our Rust bench binary on a 1.17 MB file,
both built with optimizations enabled:

Single match (find / find_iter with .count())

$ hyperfine -w 3 -r 30 \
    -n "Rust regex (ReleaseFast)" 'cat haystack.txt | ./rust-bench hello' \
    -n "Zig regex (ReleaseFast CLI)" 'cat haystack.txt | ./regex hello'

Rust (find + count):      5.3 ms
Zig  (find, first match): 4.2 ms    (1.25x **faster**)

For a single find (first match), zig-regex is slightly faster than Rust
in this test. The haystack match is near the beginning of the file, and zig-regex
finds it in ~2 µs after compile. The bulk of both runtimes is stdin I/O and
process startup.

All matches (findAll / find_iter with .count())

$ hyperfine -w 3 -r 20 \
    -n "Rust find_iter" 'cat haystack.txt | ./rust-bench hello' \
    -n "Zig findAll"    'cat haystack.txt | ./regex -g hello | wc -l'

Rust (find_iter):         5.3 ms
Zig  (findAll):          68 ms    (13x slower)

So zig-regex is fast for single matches (microseconds) but slow for findAll
(milliseconds) due to per-match VM re-init and allocation overhead (see below).

Rust compile: 2.82 ms / 1000 = 2.8 µs per compile.
Zig compile: 6.8 µs per compile — 2.4x slower because VM initialization
allocates internal state as part of the compilation path.

Why the gap?

The findAll gap (13-129x) has the same architectural causes. Crucially, for a
single find/isMatch, zig-regex is competitive — the gap only appears when
iterating over many matches because of:

1. VM re-initialization per match

zig-regex creates a new VM instance for every match in findAll (line 312 of
src/regex.zig):

var virtual_machine = vm.VM.init(self.allocator, nfa_mut, ...);

This allocates and sets up the VM's internal state. With 22k matches, that's 22k
VM init/destroy cycles.

Rust's find_iter uses a lazy DFA (deterministic finite automaton) that
runs continuously from one match to the next with no per-match teardown.

2. Per-match heap allocations

zig-regex's findAll allocates a Match struct (with capture group slices) on
the heap for every match:

try matches.append(allocator, Match{ .slice = ..., .captures = ... });

This means 22k heap allocations for 22k matches.

Rust's find_iter returns match offsets through a Match value type on the
stack — no heap allocations during iteration.

3. SIMD literal prefilters

Rust's regex crate uses the Teddy algorithm (SIMD-accelerated substring
search) to skip large non-matching regions. This is especially effective for
literal patterns like hello.

zig-regex implements basic prefix search (std.mem.indexOf) which is scalar
only.

Key takeaways

  • Full CLI, single match (find): zig-regex (ReleaseFast CLI) was
    4.2 ms vs Rust 5.3 ms on a 1.17 MB file — both dominated by I/O,
    not regex. Zig was 1.25x faster in this end-to-end test.

  • Engine only, compile + single find: zig-regex is 2-4x slower in raw
    engine time (6.8 µs vs 2.8 µs compile, 2.0 µs vs ~0.5 µs find), but these
    are single-digit microseconds — negligible. The CLI results invert because
    Zig's I/O and startup happen to be faster in this workload.

  • All matches (findAll): Rust is 13-118x faster. The bottleneck is
    zig-regex's design: it re-initializes the VM and heap-allocates a Match
    (with captures) for every single result, while Rust's find_iter is a
    zero-allocation lazy iterator. A lightweight findIter that keeps the VM
    alive between iterations would shrink this gap significantly.

  • The Rust regex crate is one of the most heavily optimized regex engines
    in existence (DFA compilation, lazy DFAs, SIMD/Teddy prefilters). zig-regex
    is a younger project using Thompson NFA + backtracking without an equivalent
    optimization pipeline.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentationenhancementNew feature or request

Type

No fields configured for Task.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions