Skip to content

Commit 6c8cc4c

Browse files
committed
ext/uri: fast-path canonical URIs in get_normalized_uri
When Uri\Rfc3986\Uri::parse() produces a URI already in canonical form (the common case: http/https URLs with no uppercase host, no percent-encoding in unreserved ranges, no ".." path segments), get_normalized_uri() no longer deep-copies the parsed struct and runs a full normalization pass. It calls uriNormalizeSyntaxMaskRequiredExA once to compute the dirty mask; a zero mask means we alias the raw uri. The struct caches the dirty mask, so multiple non-raw reads on the same instance only run the scan once. Fallback: when the mask is nonzero, we copy and normalize as before, but only for the flagged components (uriNormalizeSyntaxExMmA(..., dirty_mask, ...) instead of (..., -1, ...)). Measurements on a 17-URL mix with a realistic parse-and-read workload (10 runs of 1.7M parses each, CPU pinned via taskset, same-session stash-pop A/B so both builds share machine state): baseline mean optimized mean delta parse only 0.3992s (4.26M/s) 0.4083s (4.16M/s) noise parse + 1 read 0.6687s (2.54M/s) 0.5464s (3.11M/s) -18.3% parse + 7 reads 0.8510s (2.00M/s) 0.7305s (2.33M/s) -14.2% The "parse + 1 read" row isolates the first-read cost where this change lands. The "parse + 7 reads" row shows the amortized effect under a realistic user pattern: the first getter pays the reduced normalization cost, and the remaining six getters hit the cached normalized uri and cost the same as before. hyperfine cross-check on the whole benchmark script, 15 runs each: baseline 20.471 s +/- 1.052 s [19.535 .. 22.985] optimized 17.240 s +/- 0.540 s [16.556 .. 18.190] optimized runs 1.19 +/- 0.07 times faster. All 309 tests in ext/uri/tests pass. I checked that URIs needing normalization (http://EXAMPLE.com/A/%2e%2e/c resolving to /c) still hit the full normalize path through the nonzero dirty mask.
1 parent 8ad79e1 commit 6c8cc4c

File tree

1 file changed

+22
-1
lines changed

1 file changed

+22
-1
lines changed

ext/uri/uri_parser_rfc3986.c

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,10 @@
2525
struct php_uri_parser_rfc3986_uris {
2626
UriUriA uri;
2727
UriUriA normalized_uri;
28+
unsigned int dirty_mask;
2829
bool normalized_uri_initialized;
30+
bool normalized_uri_is_alias;
31+
bool dirty_mask_valid;
2932
};
3033

3134
static void *php_uri_parser_rfc3986_memory_manager_malloc(UriMemoryManager *memory_manager, size_t size)
@@ -85,12 +88,30 @@ ZEND_ATTRIBUTE_NONNULL static void copy_uri(UriUriA *new_uriparser_uri, const Ur
8588

8689
ZEND_ATTRIBUTE_NONNULL static UriUriA *get_normalized_uri(php_uri_parser_rfc3986_uris *uriparser_uris) {
8790
if (!uriparser_uris->normalized_uri_initialized) {
91+
if (!uriparser_uris->dirty_mask_valid) {
92+
int mask_result = uriNormalizeSyntaxMaskRequiredExA(&uriparser_uris->uri, &uriparser_uris->dirty_mask);
93+
if (mask_result != URI_SUCCESS) {
94+
uriparser_uris->dirty_mask = (unsigned int)-1;
95+
}
96+
uriparser_uris->dirty_mask_valid = true;
97+
}
98+
99+
if (uriparser_uris->dirty_mask == 0) {
100+
uriparser_uris->normalized_uri_is_alias = true;
101+
uriparser_uris->normalized_uri_initialized = true;
102+
return &uriparser_uris->uri;
103+
}
104+
88105
copy_uri(&uriparser_uris->normalized_uri, &uriparser_uris->uri);
89-
int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, (unsigned int)-1, mm);
106+
int result = uriNormalizeSyntaxExMmA(&uriparser_uris->normalized_uri, uriparser_uris->dirty_mask, mm);
90107
ZEND_ASSERT(result == URI_SUCCESS);
91108
uriparser_uris->normalized_uri_initialized = true;
92109
}
93110

111+
if (uriparser_uris->normalized_uri_is_alias) {
112+
return &uriparser_uris->uri;
113+
}
114+
94115
return &uriparser_uris->normalized_uri;
95116
}
96117

0 commit comments

Comments
 (0)