Skip to content

join: optimize locale collation performance with hybrid comparison#10393

Open
sylvestre wants to merge 2 commits intouutils:mainfrom
sylvestre:join-perf
Open

join: optimize locale collation performance with hybrid comparison#10393
sylvestre wants to merge 2 commits intouutils:mainfrom
sylvestre:join-perf

Conversation

@sylvestre
Copy link
Copy Markdown
Contributor

Performance improvements:

  • 5.18x faster than upstream locale collation
  • 35% faster than GNU join for Unicode data
  • Maintains full locale collation correctness

@github-actions
Copy link
Copy Markdown

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/follow-name (passes in this run but fails in the 'main' branch)

@ChrisDryden
Copy link
Copy Markdown
Collaborator

Were you expecting the results of this change to show up here? #10391

}

// If both are pure ASCII, byte comparison is sufficient for most locales
if left.is_ascii() && right.is_ascii() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not a valid assumption:

Heres a one liner that shows that for punctuation byte order does not always equal collation order. Its part of the reason the test was changed in this pr I think

echo -e "ab:d 1\nabc:d 2" | LC_ALL=en_US.UTF-8 join --check-order - <(echo -e "ab:d x\nabc:d y")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, well spotted

@sylvestre
Copy link
Copy Markdown
Contributor Author

Were you expecting the results of this change to show up here? #10391

yeah, i am confused why it isn't detected

Performance improvements:
- 5.18x faster than upstream locale collation
- 35% faster than GNU join for Unicode data
- Maintains full locale collation correctness
@github-actions
Copy link
Copy Markdown

GNU testsuite comparison:

Congrats! The gnu test tests/pr/bounded-memory is no longer failing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants