Skip to content

Commit e2f730d

Browse files
etrclaude
andcommitted
TASK-058 step 1: canonicalize_lookup_path returns string_view
Refactor `canonicalize_lookup_path` (anon-ns helper in webserver_dispatch.cpp) to return a std::string_view into either: - the immutable "/" literal (empty-input case), - the caller's @p path argument (already-canonical happy path -- no allocation), or - a caller-owned scratch std::string (rewrite case -- one allocation per lookup_v2 call, bounded by the input size). Most warm-path inputs arrive canonical from MHD via modded_request::standardized_url, so the warm GET path now allocates zero heap memory for canonicalisation. The miss path still pays one std::string construction when assembling the cache_key (line 161 of webserver_dispatch.cpp); the warm-cache path never reaches that construction. LIFETIME contract documented above the helper: the returned view is valid for the duration of the call only. cache_key construction at the miss path naturally copies via std::string, so callers cannot inadvertently store a dangling view. Pinned by new test/unit/route_lookup_canonicalize_test.cpp covering: empty input -> "/", missing leading slash, trailing slash stripping, "/" identity, idempotency. All five tests pass before and after the refactor (regression net pinning behaviour). The step 4 heap profile closes the allocation-elimination side. Bench (debug build, before vs after; numbers dominated by unordered_map probe + mutex lock so the saving is mostly visible in release builds and via heap profiler): canonicalize: 620 ns -> 619 ns (within noise; the allocation removal is observable on the heap profiler, not the wall-clock median in a debug build) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 18693a5 commit e2f730d

3 files changed

Lines changed: 220 additions & 14 deletions

File tree

src/detail/webserver_dispatch.cpp

Lines changed: 46 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -123,25 +123,57 @@ void webserver_impl::invalidate_route_cache() {
123123
// would never share an entry. Matches the v1 dispatch path, which
124124
// constructs a non-registration http_endpoint at lookup time and so
125125
// gets the same normalization for free.
126-
static std::string canonicalize_lookup_path(const std::string& path) {
126+
//
127+
// TASK-058 step 1: return a string_view so the happy path (input
128+
// already canonical) allocates zero heap memory. On the rewrite
129+
// path the canonicalised form is written into the caller-owned
130+
// @p scratch buffer and a view into that buffer is returned.
131+
//
132+
// LIFETIME: the returned view is valid for the duration of the
133+
// call chain only; it points at either the immutable "/" literal,
134+
// the caller's @p path argument, or the caller's @p scratch buffer.
135+
// Any caller storing the view must copy it into an owning string
136+
// first. The cache layer (cache_key constructed below) already
137+
// copies via std::string, so the contract is naturally respected.
138+
static std::string_view canonicalize_lookup_path(
139+
std::string_view path, std::string& scratch) {
127140
if (path.empty()) {
128-
return "/";
141+
// Immutable canonical root -- no scratch usage.
142+
return std::string_view{"/", 1};
143+
}
144+
const bool has_leading_slash = (path.front() == '/');
145+
const bool has_trailing_slash = (path.size() > 1 && path.back() == '/');
146+
if (has_leading_slash && !has_trailing_slash) {
147+
// Already canonical: return the caller's view directly.
148+
return path;
129149
}
130-
std::string out = path;
131-
if (out.front() != '/') {
132-
out.insert(out.begin(), '/');
150+
// Rewrite path: write the canonicalised form into the caller's
151+
// scratch buffer and return a view into it.
152+
scratch.clear();
153+
scratch.reserve(path.size() + (has_leading_slash ? 0 : 1));
154+
if (!has_leading_slash) {
155+
scratch.push_back('/');
133156
}
134-
if (out.size() > 1 && out.back() == '/') {
135-
out.pop_back();
157+
if (has_trailing_slash) {
158+
scratch.append(path.data(), path.size() - 1);
159+
} else {
160+
scratch.append(path.data(), path.size());
136161
}
137-
return out;
162+
return std::string_view{scratch};
138163
}
139164

140165
webserver_impl::lookup_result
141166
webserver_impl::lookup_v2(http_method method, const std::string& path) {
142167
lookup_result result;
143168

144-
std::string lookup_path = canonicalize_lookup_path(path);
169+
// TASK-058 step 1: canonicalize_lookup_path returns a string_view
170+
// into either @p path (already-canonical happy path -- no
171+
// allocation), the immutable "/" literal (empty-input case), or
172+
// @p canonicalize_scratch (rewrite case -- single allocation,
173+
// bounded by the input size).
174+
std::string canonicalize_scratch;
175+
std::string_view lookup_path =
176+
canonicalize_lookup_path(path, canonicalize_scratch);
145177

146178
// Step 1: cache. Cache under the canonical key so /foo and /foo/
147179
// share an entry. Use find_by_view to avoid copying lookup_path
@@ -155,10 +187,11 @@ webserver_impl::lookup_v2(http_method method, const std::string& path) {
155187
result.captured_params = std::move(cached.captured_params);
156188
return result;
157189
}
158-
// Construct key by moving lookup_path into the cache_key, avoiding a
159-
// second heap allocation. All subsequent tier lookups use key.path,
160-
// which is the same std::string now owned by the key.
161-
cache_key key{method, std::move(lookup_path)};
190+
// Construct key by copying lookup_path into the cache_key. On the
191+
// miss path we pay one std::string construction; the warm-cache
192+
// path never reaches this line. All subsequent tier lookups use
193+
// key.path, which is the std::string now owned by the key.
194+
cache_key key{method, std::string(lookup_path)};
162195

163196
// Step 2: walk the tiers under a shared lock.
164197
{

test/Makefile.am

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ LDADD += -lcurl
2626

2727
AM_CPPFLAGS = -I$(top_srcdir)/src -I$(top_srcdir)/src/httpserver/ -DHTTPSERVER_COMPILATION
2828
METASOURCES = AUTO
29-
check_PROGRAMS = basic file_upload http_utils threaded nodelay string_utilities http_endpoint ban_system ws_start_stop authentication deferred http_resource http_response create_webserver create_webserver_explicit new_response_types daemon_info uri_log feature_unavailable header_hygiene_iovec header_hygiene iovec_entry http_method constants body http_response_sbo http_response_factories http_response_move_sanitizer webserver_pimpl http_request_pimpl create_test_request http_request_arena http_request_const_getters http_request_tls_accessors http_request_operator_stream webserver_register_smartptr webserver_register_path_prefix webserver_on_methods webserver_route route_table lookup_pipeline route_table_concurrency routing_regression v2_dispatch_contract threadsafety_stress webserver_features webserver_ws_unavailable webserver_register_ws_smartptr webserver_dauth_unavailable consumer_fixture header_hygiene_hooks hook_api_shape hooks_no_firing hooks_accept_ctx_shape hooks_connection_lifecycle hooks_accept_decision_banned hooks_accept_decision_throwing hooks_body_chunk_ctx_shape hooks_request_received_short_circuit hooks_body_chunk_observes_progress hooks_body_chunk_short_circuit_no_leak hooks_before_handler_ctx_shape hooks_route_resolved_miss_and_hit hooks_before_handler_short_circuit hooks_alias_count hooks_alias_functional hooks_handler_exception_chain hooks_handler_exception_user_handler_throws_continues_chain hooks_handler_exception_fallback_to_hardcoded_500 hooks_handler_exception_slot hooks_response_sent_ctx_shape hooks_request_completed_ctx_shape hooks_after_handler_replaces_response hooks_after_handler_mutates_response_in_place hooks_response_sent_carries_status_bytes_timing hooks_request_completed_fires_on_early_failure hooks_log_access_alias_slot hooks_per_route_invalid_phase_throws hooks_per_route_order hooks_per_route_early_413_per_endpoint hooks_per_route_resource_destroyed_first hooks_per_route_concurrent_registration auth_handler_optional_signature auth_handler_legacy_shim
29+
check_PROGRAMS = basic file_upload http_utils threaded nodelay string_utilities http_endpoint ban_system ws_start_stop authentication deferred http_resource http_response create_webserver create_webserver_explicit new_response_types daemon_info uri_log feature_unavailable header_hygiene_iovec header_hygiene iovec_entry http_method constants body http_response_sbo http_response_factories http_response_move_sanitizer webserver_pimpl http_request_pimpl create_test_request http_request_arena http_request_const_getters http_request_tls_accessors http_request_operator_stream webserver_register_smartptr webserver_register_path_prefix webserver_on_methods webserver_route route_table lookup_pipeline route_table_concurrency routing_regression route_lookup_canonicalize v2_dispatch_contract threadsafety_stress webserver_features webserver_ws_unavailable webserver_register_ws_smartptr webserver_dauth_unavailable consumer_fixture header_hygiene_hooks hook_api_shape hooks_no_firing hooks_accept_ctx_shape hooks_connection_lifecycle hooks_accept_decision_banned hooks_accept_decision_throwing hooks_body_chunk_ctx_shape hooks_request_received_short_circuit hooks_body_chunk_observes_progress hooks_body_chunk_short_circuit_no_leak hooks_before_handler_ctx_shape hooks_route_resolved_miss_and_hit hooks_before_handler_short_circuit hooks_alias_count hooks_alias_functional hooks_handler_exception_chain hooks_handler_exception_user_handler_throws_continues_chain hooks_handler_exception_fallback_to_hardcoded_500 hooks_handler_exception_slot hooks_response_sent_ctx_shape hooks_request_completed_ctx_shape hooks_after_handler_replaces_response hooks_after_handler_mutates_response_in_place hooks_response_sent_carries_status_bytes_timing hooks_request_completed_fires_on_early_failure hooks_log_access_alias_slot hooks_per_route_invalid_phase_throws hooks_per_route_order hooks_per_route_early_413_per_endpoint hooks_per_route_resource_destroyed_first hooks_per_route_concurrent_registration auth_handler_optional_signature auth_handler_legacy_shim
3030

3131
MOSTLYCLEANFILES = *.gcda *.gcno *.gcov
3232

@@ -282,6 +282,15 @@ threadsafety_stress_SOURCES = integ/threadsafety_stress.cpp
282282
routing_regression_SOURCES = unit/routing_regression_test.cpp
283283
routing_regression_LDADD = $(LDADD) -lmicrohttpd
284284

285+
# route_lookup_canonicalize (TASK-058 step 1): pins lookup_v2's
286+
# canonicalisation contract behind a behaviour-only test (empty, leading-
287+
# slash insertion, trailing-slash stripping, "/" identity, idempotency).
288+
# Same -lmicrohttpd dependency as routing_regression: the webserver
289+
# constructor pulls libhttpserver in, which transitively touches
290+
# libmicrohttpd headers.
291+
route_lookup_canonicalize_SOURCES = unit/route_lookup_canonicalize_test.cpp
292+
route_lookup_canonicalize_LDADD = $(LDADD) -lmicrohttpd
293+
285294
# v2_dispatch_contract: TASK-053. End-to-end safety net pinning the
286295
# observable invariants the dispatch path must satisfy AFTER finalize_answer
287296
# is cut over from resolve_resource_for_request (v1 maps) to
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
/*
2+
This file is part of libhttpserver
3+
Copyright (C) 2011-2026 Sebastiano Merlino
4+
5+
This library is free software; you can redistribute it and/or
6+
modify it under the terms of the GNU Lesser General Public
7+
License as published by the Free Software Foundation; either
8+
version 2.1 of the License, or (at your option) any later version.
9+
10+
This library is distributed in the hope that it will be useful,
11+
but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13+
Lesser General Public License for more details.
14+
15+
You should have received a copy of the GNU Lesser General Public
16+
License along with this library; if not, write to the Free Software
17+
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
18+
USA
19+
*/
20+
21+
// TASK-058 step 1: pin canonicalize_lookup_path's observable contract
22+
// behind lookup_v2. The static helper itself is anonymous-namespaced
23+
// inside webserver_dispatch.cpp and not linkable from tests, so we pin
24+
// its observable effect: two lookups under different spellings of the
25+
// same canonical path must share a cache entry (the second call hits
26+
// tier_hit::cache). Also covers the boundary inputs:
27+
// - empty string -> "/"
28+
// - "foo" without leading slash -> "/foo"
29+
// - "/foo/" with trailing slash -> "/foo"
30+
// - "/" identity
31+
//
32+
// These tests must pass both BEFORE and AFTER the string_view refactor:
33+
// they pin behavior, not internal allocation. The heap-profile gate in
34+
// step 4 closes the loop on the allocation-elimination side.
35+
36+
#include <memory>
37+
#include <string>
38+
39+
#include "./httpserver.hpp"
40+
#include "./httpserver/detail/webserver_impl.hpp"
41+
#include "./littletest.hpp"
42+
43+
namespace ht = httpserver;
44+
45+
namespace {
46+
47+
class noop_resource : public ht::http_resource {
48+
public:
49+
ht::http_response render_get(const ht::http_request&) override {
50+
return ht::http_response::string("ok");
51+
}
52+
};
53+
54+
ht::detail::webserver_impl& impl_of(ht::webserver& ws) {
55+
return *ht::webserver_test_access::impl(ws);
56+
}
57+
58+
} // namespace
59+
60+
LT_BEGIN_SUITE(route_lookup_canonicalize_suite)
61+
void set_up() {}
62+
void tear_down() {}
63+
LT_END_SUITE(route_lookup_canonicalize_suite)
64+
65+
// ---------------------------------------------------------------------
66+
// Empty input canonicalises to "/" -> the same entry as "/" itself.
67+
// ---------------------------------------------------------------------
68+
LT_BEGIN_AUTO_TEST(route_lookup_canonicalize_suite, empty_path_maps_to_root)
69+
ht::webserver ws{ht::create_webserver(8080)
70+
.start_method(ht::http::http_utils::INTERNAL_SELECT)};
71+
ws.register_path("/", std::make_shared<noop_resource>());
72+
73+
auto a = impl_of(ws).lookup_v2(ht::http_method::get, std::string(""));
74+
LT_CHECK(a.found);
75+
76+
// Second call under the canonical spelling must hit the cache the
77+
// empty-string canonicalisation populated.
78+
auto b = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/"));
79+
LT_CHECK(b.found);
80+
LT_CHECK(b.tier == ht::detail::webserver_impl::tier_hit::cache);
81+
LT_END_AUTO_TEST(empty_path_maps_to_root)
82+
83+
// ---------------------------------------------------------------------
84+
// Leading-slash insertion: "foo" canonicalises to "/foo".
85+
// ---------------------------------------------------------------------
86+
LT_BEGIN_AUTO_TEST(route_lookup_canonicalize_suite,
87+
missing_leading_slash_is_prepended)
88+
ht::webserver ws{ht::create_webserver(8080)
89+
.start_method(ht::http::http_utils::INTERNAL_SELECT)};
90+
ws.register_path("/foo", std::make_shared<noop_resource>());
91+
92+
auto a = impl_of(ws).lookup_v2(ht::http_method::get, std::string("foo"));
93+
LT_CHECK(a.found);
94+
LT_CHECK(a.entry.methods == ht::method_set{}.set_all());
95+
96+
// The second lookup under the canonical spelling shares the cache
97+
// entry installed by the first call -- proves both spellings hash
98+
// to the same canonical key.
99+
auto b = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/foo"));
100+
LT_CHECK(b.found);
101+
LT_CHECK(b.tier == ht::detail::webserver_impl::tier_hit::cache);
102+
LT_END_AUTO_TEST(missing_leading_slash_is_prepended)
103+
104+
// ---------------------------------------------------------------------
105+
// Trailing-slash stripping: "/foo/" canonicalises to "/foo". This is
106+
// the case routing_regression_test::exact_path_normalization_aliases
107+
// already covers end-to-end; we pin the cache-sharing property too.
108+
// ---------------------------------------------------------------------
109+
LT_BEGIN_AUTO_TEST(route_lookup_canonicalize_suite,
110+
trailing_slash_is_stripped)
111+
ht::webserver ws{ht::create_webserver(8080)
112+
.start_method(ht::http::http_utils::INTERNAL_SELECT)};
113+
ws.register_path("/foo", std::make_shared<noop_resource>());
114+
115+
auto a = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/foo/"));
116+
LT_CHECK(a.found);
117+
118+
auto b = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/foo"));
119+
LT_CHECK(b.found);
120+
LT_CHECK(b.tier == ht::detail::webserver_impl::tier_hit::cache);
121+
LT_END_AUTO_TEST(trailing_slash_is_stripped)
122+
123+
// ---------------------------------------------------------------------
124+
// "/" identity: lookup_v2("/") on a registered "/" route hits exact
125+
// the first time, cache the second. Pins that the canonicalisation
126+
// of "/" does NOT pop its single character.
127+
// ---------------------------------------------------------------------
128+
LT_BEGIN_AUTO_TEST(route_lookup_canonicalize_suite, root_identity)
129+
ht::webserver ws{ht::create_webserver(8080)
130+
.start_method(ht::http::http_utils::INTERNAL_SELECT)};
131+
ws.register_path("/", std::make_shared<noop_resource>());
132+
133+
auto a = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/"));
134+
LT_CHECK(a.found);
135+
LT_CHECK(a.tier == ht::detail::webserver_impl::tier_hit::exact);
136+
137+
auto b = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/"));
138+
LT_CHECK(b.found);
139+
LT_CHECK(b.tier == ht::detail::webserver_impl::tier_hit::cache);
140+
LT_END_AUTO_TEST(root_identity)
141+
142+
// ---------------------------------------------------------------------
143+
// Already-canonical input takes the no-allocation happy path after
144+
// step 1. Behaviorally observable here: same-shape inputs share the
145+
// cache entry the prior call installed. (The no-allocation property
146+
// is closed by the step 4 heap-profile gate.)
147+
// ---------------------------------------------------------------------
148+
LT_BEGIN_AUTO_TEST(route_lookup_canonicalize_suite,
149+
already_canonical_is_idempotent)
150+
ht::webserver ws{ht::create_webserver(8080)
151+
.start_method(ht::http::http_utils::INTERNAL_SELECT)};
152+
ws.register_path("/foo", std::make_shared<noop_resource>());
153+
154+
auto a = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/foo"));
155+
LT_CHECK(a.found);
156+
157+
auto b = impl_of(ws).lookup_v2(ht::http_method::get, std::string("/foo"));
158+
LT_CHECK(b.found);
159+
LT_CHECK(b.tier == ht::detail::webserver_impl::tier_hit::cache);
160+
LT_END_AUTO_TEST(already_canonical_is_idempotent)
161+
162+
LT_BEGIN_AUTO_TEST_ENV()
163+
AUTORUN_TESTS()
164+
LT_END_AUTO_TEST_ENV()

0 commit comments

Comments
 (0)