wc: increase buffer size for Unicode counting paths by mattsu2020 · Pull Request #11276 · uutils/coreutils

mattsu2020 · 2026-03-10T09:48:29Z

Summary

The Unicode path in wc was still using the default BufReader capacity of 8 KiB. That made -w, -L, and the default -lwc path pay more fill_buf and UTF-8 decoding overhead than necessary.

This PR switches the Unicode counting path to use a 256 KiB buffer, matching the fast path buffer size and reducing per-chunk overhead.

Changes

add WORD_COUNT_BUF_SIZE = 256 * 1024 in src/uu/wc/src/countable.rs
change File's WordCountable::buffered() to use BufReader::with_capacity(...)
change StdinLock's WordCountable::buffered() to use BufReader::with_capacity(...)

Background

The -c/-l/-m fast path already uses a 256 KiB buffer, but the Unicode path used by -w, -L, and related combinations goes through BufReadDecoder and was still backed by the default 8 KiB BufReader.

As a result, large inputs on the Unicode path were processed in much smaller chunks, adding avoidable fixed overhead.

Testing

cargo build -p uu_wc --release
cargo test -p uu_wc
cargo test --features wc test_wc

The wc integration tests passed (57 passed).

github-actions · 2026-03-10T10:01:00Z

GNU testsuite comparison:

GNU test failed: tests/misc/io-errors. tests/misc/io-errors is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/date/date-locale-hour (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tty/tty-eof (fails in this run but passes in the 'main' branch)
Congrats! The gnu test tests/unexpand/bounded-memory is now passing!

codspeed-hq · 2026-03-10T10:23:58Z

Merging this PR will degrade performance by 93.59%

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

❌ 3 regressed benchmarks
✅ 295 untouched benchmarks
⏩ 48 skipped benchmarks¹

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Memory	`wc_words_large_line_count[100000]`	16.7 KB	260.4 KB	-93.59%
❌	Memory	`wc_words_synthetic[2000]`	16.7 KB	260.4 KB	-93.59%
❌	Memory	`wc_default_large_line_count[100000]`	16.7 KB	260.4 KB	-93.59%

_{Comparing mattsu2020:performance_analystic (d8a78c1) with main (36b5e59)²}

48 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (88219aa) during the generation of this report, so 36b5e59 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

Increased buffer size to 256KB for improved performance in word counting operations across all input types (stdin and files). This optimization reduces I/O overhead by processing larger chunks of data at once.

github-actions · 2026-03-10T13:15:13Z

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout-group. tests/timeout/timeout-group is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/symlink (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/tty/tty-eof (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/cut/cut-huge-range is now being skipped but was previously passing.

xtqqczze · 2026-03-10T16:08:46Z

The -c/-l/-m fast path already uses a 256 KiB buffer

It uses a 256 KiB stack array, this PR uses BufReader which allocates.

refactor(wc): optimize buffer size for word counting operations

d8a78c1

Increased buffer size to 256KB for improved performance in word counting operations across all input types (stdin and files). This optimization reduces I/O overhead by processing larger chunks of data at once.

sylvestre force-pushed the performance_analystic branch from 11c950b to d8a78c1 Compare March 10, 2026 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

wc: increase buffer size for Unicode counting paths#11276

wc: increase buffer size for Unicode counting paths#11276
mattsu2020 wants to merge 1 commit intouutils:mainfrom
mattsu2020:performance_analystic

mattsu2020 commented Mar 10, 2026

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

codspeed-hq bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

xtqqczze commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mattsu2020 commented Mar 10, 2026

Summary

Changes

Background

Testing

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

codspeed-hq bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will degrade performance by 93.59%

Performance Changes

Footnotes

Uh oh!

github-actions bot commented Mar 10, 2026

Uh oh!

xtqqczze commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Mar 10, 2026 •

edited

Loading