ext/standard: speed up php_url_parse_ex2 by ~12% by iliaal · Pull Request #31 · iliaal/php-src

iliaal · 2026-04-11T18:12:37Z

Summary

Three related ctype-macro replacements in ext/standard/url.c that
speed up php_url_parse_ex2 and therefore parse_url() by ~12% on a
realistic URL mix. Per-change breakdown is in the commit message.

This also feeds the php_parse_url backend in ext/uri, which wraps
php_url_parse_ex2 and is the default parser returned by
php_uri_get_parser(NULL) for streams, filter_var(FILTER_VALIDATE_URL),
soap, http/ftp wrappers, and the internal php_uri_parse() C API.

Benchmark

17 URL shapes (plain http/https, deep paths, with query/fragment, with
userinfo, IPv4, IPv6, %-encoded path, ftp, mailto, data, file, relative).
1M iterations per run, 17M total parse_url() calls per benchmark, CPU
pinned via taskset -c 0, same-session A/B (stash + rebuild + rerun
each direction).

	baseline	optimized	delta
`parse_url()` full	1.90s (8.94M/s)	1.68s (10.10M/s)	−12% / +13% throughput

What's in the patch

php_replace_controlchars replaces iscntrl() with inline
c < 0x20 || c == 0x7f. glibc's iscntrl hits __ctype_b_loc()
per byte; callgrind showed it at ~14% of total instructions on a
realistic URL workload. URL components are bytes, not
locale-dependent text, and the Zend scanner uses the same inline
pattern (yych <= 0x1F).
Scheme-validation walk swaps isalpha(*p) && isdigit(*p) for
php_url_is_scheme_char((unsigned char) *p), which does
((c | 0x20) - 'a' < 26u) || (c - '0' < 10u) plus the three
literal character checks. Same change for the two isdigit sites
in the port-scan loops via php_url_is_ascii_digit.
Skipped php_replace_controlchars on ret->scheme in all three
allocation branches. The scheme walk above has already rejected any
byte outside [a-zA-Z0-9+.-], so the control-char scan can't find
anything to replace.

Three related changes to ext/standard/url.c targeting the ctype macros on the parse_url hot path. On a 17-URL mix (17M parses per run, CPU pinned, same-session A/B), median wall time drops from 1.90s to 1.68s, a ~12% reduction and ~13% throughput increase (8.94M/s to 10.10M/s). 1. php_replace_controlchars replaces its iscntrl() call with an inline `c < 0x20 || c == 0x7f` comparison. Callgrind showed iscntrl at ~14% of total instructions on a realistic URL workload; glibc's iscntrl goes through __ctype_b_loc() per byte for a TLS lookup and table deref, which defeats auto-vectorization. URL components are bytes, not locale-dependent text, so C/POSIX semantics are what we want regardless of the process locale. The Zend language scanner uses the same pattern (yych <= 0x1F). This runs once per component per parse, up to 7 times. 2. The scheme-validation walk uses isalpha/isdigit which have the same __ctype_b_loc tax. I extracted the check into php_url_is_scheme_char with an inline ASCII test: ((c | 0x20) - 'a' < 26u) || (c - '0' < 10u) for the letter/digit half, plus the three literal comparisons for + - and . The scheme loop runs once per byte of the scheme on every parse. A helper php_url_is_ascii_digit covers the two isdigit call sites in the port-scan loops (one in the mailto-branch port probe, one in the parse_port fallback). 3. The three branches that allocate ret->scheme all followed zend_string_init with a php_replace_controlchars call. The scheme loop above has already rejected any byte that isn't in [a-zA-Z0-9+.-], so the control-char scan on scheme is dead work. Removed from all three sites. No behavior change: the inline comparisons are identical in behavior to the ctype macros in C/POSIX, and URL bytes are never locale-dependent. I checked that contaminated inputs like http://ex\x7fample.com/p\x1fath still get their control bytes replaced with underscores.

iliaal force-pushed the perf/standard-parse-url-ctype branch from 24463fe to e670fd7 Compare April 11, 2026 20:02

iliaal mentioned this pull request Apr 11, 2026

Fix GH-12703: parse_url colon-in-path + optimize control-char replacement php/php-src#21718

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ext/standard: speed up php_url_parse_ex2 by ~12%#31

ext/standard: speed up php_url_parse_ex2 by ~12%#31
iliaal wants to merge 1 commit intomasterfrom
perf/standard-parse-url-ctype

iliaal commented Apr 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

iliaal commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

What's in the patch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iliaal commented Apr 11, 2026 •

edited

Loading