Skip to content

Commit 9cb548e

Browse files
etrclaude
andcommitted
TASK-072: extract hex_digit_value+unescape loop into shared header
Both src/http_utils.cpp and src/detail/http_request_impl.cpp kept private copies of hex_digit_value and the core %HH/'+' decode loop, each with an independent comment acknowledging the duplication. Promote both helpers to src/httpserver/detail/unescape_helpers.hpp: - hex_digit_value: constexpr int, inlined - unescape_buf_raw: inline raw-buffer unescape (char*, size_t) -> size_t http_unescape in http_utils.cpp is now a thin wrapper that calls unescape_buf_raw and adds the trailing '\0' expected by callers of the std::string* variant. http_request_impl.cpp's anonymous http_unescape_raw is removed; its sole caller (unescape_in_arena) now calls unescape_buf_raw directly. Also addresses: - Rename unescape_in_arena parameter 'sink' -> 'value' and thread_local 'user_scratch' -> 'thread_value' for naming consistency. (code-simplifier-iter1-5) - Document the per-thread amortisation contract and arena-waste behaviour of the user-callback path. (performance-reviewer-iter1-2, performance-reviewer-iter1-3) - Collapse the long call-site comment in build_request_args to a single cross-reference line. (code-simplifier-iter1-7) Addresses: code-simplifier-iter1-1 (critical), code-simplifier-iter1-2 (critical), code-simplifier-iter1-5 (major), performance-reviewer-iter1-2 (major), performance-reviewer-iter1-3 (major) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent a653667 commit 9cb548e

3 files changed

Lines changed: 147 additions & 131 deletions

File tree

src/detail/http_request_impl.cpp

Lines changed: 42 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939

4040
#include "httpserver/detail/connection_state.hpp"
4141
#include "httpserver/detail/http_request_impl.hpp"
42+
#include "httpserver/detail/unescape_helpers.hpp"
4243
#include "httpserver/http_utils.hpp"
4344

4445
namespace httpserver {
@@ -157,84 +158,53 @@ const http::header_view_map& http_request_impl::ensure_headerlike_cache(MHD_Valu
157158

158159
namespace {
159160

160-
// TASK-072: hex_digit_value matches src/http_utils.cpp's static helper.
161-
// Duplicated here (rather than promoted to a header) because both
162-
// copies are tiny and trivially constexpr-foldable; centralising it
163-
// would require either a header-only inline helper or an exposed
164-
// non-public symbol, neither of which justifies the surface area.
165-
constexpr int hex_digit_value(char c) noexcept {
166-
if (c >= '0' && c <= '9') return c - '0';
167-
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
168-
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
169-
return -1;
170-
}
171-
172-
// TASK-072: raw-buffer unescape, mirrors http_unescape(std::string*)
173-
// byte-identically. Operates in-place on [data, data+size); returns
174-
// the new size after the transformation. The output is guaranteed to
175-
// be <= input length (the standard %HH / '+'->' ' replacement only
176-
// ever shrinks).
177-
inline std::size_t http_unescape_raw(char* data, std::size_t size) noexcept {
178-
if (size == 0 || data[0] == '\0') return 0;
179-
std::size_t rpos = 0;
180-
std::size_t wpos = 0;
181-
while (rpos < size && data[rpos] != '\0') {
182-
switch (data[rpos]) {
183-
case '+':
184-
data[wpos++] = ' ';
185-
++rpos;
186-
break;
187-
case '%':
188-
if (size > rpos + 2) {
189-
int hi = hex_digit_value(data[rpos + 1]);
190-
int lo = hex_digit_value(data[rpos + 2]);
191-
if (hi >= 0 && lo >= 0) {
192-
data[wpos++] =
193-
static_cast<char>((hi << 4) | lo);
194-
rpos += 3;
195-
break;
196-
}
197-
}
198-
// intentional fall through!
199-
default:
200-
data[wpos++] = data[rpos++];
201-
}
202-
}
203-
return wpos;
204-
}
205-
206161
// TASK-072: arena-routed unescape. The caller passes an arena-backed
207162
// pmr::string already holding the raw bytes; we run the unescape
208163
// transformation directly on the pmr::string's storage so no
209164
// global-heap allocation occurs on the warm path.
210165
//
211-
// When a user unescaper is registered, the public callback signature
212-
// is `void(std::string&)` (ABI-locked), so we route through a
213-
// thread_local std::string whose capacity amortises across all
214-
// requests on the same worker thread. The first request on a thread
215-
// grows it once; subsequent requests reuse the capacity (zero
216-
// per-request global-heap allocations on the warm path). The result
217-
// bytes are then assigned back into the arena-backed sink.
218-
inline void unescape_in_arena(std::pmr::string& sink,
166+
// Default path: calls httpserver::detail::unescape_buf_raw (shared
167+
// helper from unescape_helpers.hpp -- same algorithm as the
168+
// http_unescape wrapper in http_utils.cpp, one truth-source).
169+
//
170+
// User-callback path: the public callback signature is
171+
// `void(std::string&)` (ABI-locked), so we route through a
172+
// thread_local std::string (thread_value) whose capacity amortises
173+
// across all requests on the same worker thread.
174+
//
175+
// Amortisation contract: zero global-heap allocations on the warm
176+
// path per thread, *after* thread_value has grown to the peak value
177+
// length seen on that thread. The first call on a new thread, or the
178+
// first call that sees a value longer than any previous one, triggers
179+
// exactly one global-heap allocation to grow thread_value's buffer;
180+
// all subsequent calls at or below that length are allocation-free.
181+
// (performance-reviewer-iter1-2)
182+
//
183+
// Arena-waste note: when the user callback grows the value
184+
// (thread_value.size() > original value.size()), value's original
185+
// arena allocation is abandoned in the monotonic arena until
186+
// reset_arena(). This is an accepted trade-off of the ABI-locked
187+
// void(std::string&) signature; the arena is reset per request so
188+
// the waste is bounded by the request's total value bytes.
189+
// (performance-reviewer-iter1-3)
190+
inline void unescape_in_arena(std::pmr::string& value,
219191
unescaper_ptr user_fn) {
220-
if (sink.empty()) return;
192+
if (value.empty()) return;
221193
if (user_fn == nullptr) {
222194
// Default %HH / '+' decode: run in-place on the arena
223195
// buffer, then truncate to the new size.
224-
const std::size_t new_size = http_unescape_raw(
225-
sink.data(), sink.size());
226-
sink.resize(new_size);
196+
const std::size_t new_size = httpserver::detail::unescape_buf_raw(
197+
value.data(), value.size());
198+
value.resize(new_size);
227199
return;
228200
}
229201
// User-callback path: route through a per-thread reusable
230202
// std::string scratch buffer so the warm-path cost is zero
231-
// global-heap allocations. The thread_local lives for the
232-
// worker thread's lifetime; capacity grows once per peak input
233-
// size on the thread.
234-
thread_local std::string user_scratch;
235-
user_scratch.assign(sink.data(), sink.size());
236-
user_fn(user_scratch);
237-
sink.assign(user_scratch.data(), user_scratch.size());
203+
// global-heap allocations after the first call on this thread.
204+
thread_local std::string thread_value;
205+
thread_value.assign(value.data(), value.size());
206+
user_fn(thread_value);
207+
value.assign(thread_value.data(), thread_value.size());
238208
}
239209

240210
} // namespace
@@ -274,26 +244,10 @@ MHD_Result http_request_impl::build_request_args(void* cls, MHD_ValueKind kind,
274244
}
275245
aa->accumulated_bytes += this_pair_bytes;
276246

277-
// TASK-072: route the unescape output into the per-connection
278-
// arena. The destination pmr::string lives in the arena domain, so
279-
// its bytes consume only arena memory (no global-heap allocation).
280-
//
281-
// Two paths:
282-
// - null user-unescaper: do the standard %HH / '+'->' '
283-
// transformation in-place on the arena-backed pmr::string via
284-
// http_unescape_raw (raw-buffer overload of the standard
285-
// unescape, defined in the anonymous namespace below).
286-
// - user-registered unescaper: the public callback signature
287-
// `void(std::string&)` is ABI-locked, so we pass it a
288-
// thread_local std::string scratch buffer; its capacity
289-
// amortises across requests on the same worker thread, leaving
290-
// the warm-path per-request cost at zero global-heap
291-
// allocations. The result is then copied into the arena-backed
292-
// pmr::string sink.
293-
//
294-
// The returned pmr::string's data is owned by the arena and lives
295-
// until connection_state::reset_arena() runs (request completion),
296-
// matching the TASK-018 string_view lifetime contract.
247+
// Arena-routed unescape: see unescape_in_arena() above for path details.
248+
// The returned pmr::string's data is owned by the arena and lives until
249+
// connection_state::reset_arena() runs (request completion), matching
250+
// the TASK-018 string_view lifetime contract.
297251
auto pmr_alloc = args.get_allocator();
298252
auto it = existing_it;
299253
if (it == args.end()) {
@@ -305,12 +259,10 @@ MHD_Result http_request_impl::build_request_args(void* cls, MHD_ValueKind kind,
305259
}
306260

307261
// Allocate the destination pmr::string in the arena domain and
308-
// copy the raw input bytes into it. The outer vector's allocator-
309-
// propagating construct wires the inner pmr::string's allocator,
310-
// so we pass (ptr, size) here. The unescape transformation then
262+
// copy the raw input bytes into it. The unescape transformation
311263
// runs in-place on the arena-backed buffer.
312-
auto& sink = it->second.emplace_back(val_sv.data(), val_sv.size());
313-
unescape_in_arena(sink, aa->unescaper);
264+
auto& arg_value_ref = it->second.emplace_back(val_sv.data(), val_sv.size());
265+
unescape_in_arena(arg_value_ref, aa->unescaper);
314266
return MHD_YES;
315267
}
316268

src/http_utils.cpp

Lines changed: 11 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@
5757
#include <vector>
5858

5959
#include "httpserver/constants.hpp"
60+
#include "httpserver/detail/unescape_helpers.hpp"
6061
#include "httpserver/string_utilities.hpp"
6162

6263
#if defined (__CYGWIN__)
@@ -302,49 +303,18 @@ std::string http_utils::sanitize_upload_filename(const std::string& filename) {
302303
// src/detail/ip_representation.cpp to keep this TU under the project
303304
// per-file LOC ceiling. See FILE_LOC_MAX in scripts/check-file-size.sh.
304305

305-
static inline int hex_digit_value(char c) {
306-
if (c >= '0' && c <= '9') return c - '0';
307-
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
308-
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
309-
return -1;
310-
}
311-
306+
// TASK-072 (code-simplifier-iter1-1/2): hex_digit_value and the core
307+
// unescape loop have been promoted to the shared internal header
308+
// src/httpserver/detail/unescape_helpers.hpp so that this TU and
309+
// src/detail/http_request_impl.cpp share one truth-source.
310+
// http_unescape is now a thin wrapper around unescape_buf_raw.
312311
size_t http_unescape(std::string* val) {
313312
if (val->empty()) return 0;
314-
315-
unsigned int rpos = 0;
316-
unsigned int wpos = 0;
317-
318-
unsigned int size = val->size();
319-
320-
while (rpos < size && (*val)[rpos] != '\0') {
321-
switch ((*val)[rpos]) {
322-
case '+':
323-
(*val)[wpos] = ' ';
324-
wpos++;
325-
rpos++;
326-
break;
327-
case '%':
328-
if (size > rpos + 2) {
329-
int hi = hex_digit_value((*val)[rpos + 1]);
330-
int lo = hex_digit_value((*val)[rpos + 2]);
331-
if (hi >= 0 && lo >= 0) {
332-
(*val)[wpos] = static_cast<unsigned char>((hi << 4) | lo);
333-
wpos++;
334-
rpos += 3;
335-
break;
336-
}
337-
}
338-
// intentional fall through!
339-
default:
340-
(*val)[wpos] = (*val)[rpos];
341-
wpos++;
342-
rpos++;
343-
}
344-
}
345-
(*val)[wpos] = '\0'; // add 0-terminator
346-
val->resize(wpos);
347-
return wpos; // = strlen(val)
313+
const std::size_t new_size =
314+
httpserver::detail::unescape_buf_raw(val->data(), val->size());
315+
(*val)[new_size] = '\0'; // add 0-terminator (std::string owns the byte)
316+
val->resize(new_size);
317+
return new_size; // = strlen(val)
348318
}
349319

350320
const std::string load_file(const std::string& filename) {
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
/*
2+
This file is part of libhttpserver
3+
Copyright (C) 2011-2026 Sebastiano Merlino
4+
5+
This library is free software; you can redistribute it and/or
6+
modify it under the terms of the GNU Lesser General Public
7+
License as published by the Free Software Foundation; either
8+
version 2.1 of the License, or (at your option) any later version.
9+
10+
This library is distributed in the hope that it will be useful,
11+
but WITHOUT ANY WARRANTY; without even the implied warranty of
12+
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
13+
Lesser General Public License for more details.
14+
15+
You should have received a copy of the GNU Lesser General Public
16+
License along with this library; if not, write to the Free Software
17+
Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301
18+
USA
19+
*/
20+
21+
// Internal-only helpers for URL percent-decoding.
22+
//
23+
// Shared between src/http_utils.cpp (http_unescape) and
24+
// src/detail/http_request_impl.cpp (arena-routed unescape).
25+
// Not part of the public API; not installed.
26+
//
27+
// TASK-072: extracted from the two translation units that previously
28+
// each maintained a private copy. Having one source of truth ensures
29+
// that bug-fixes and RFC-3986 edge-case corrections propagate uniformly.
30+
// (code-simplifier-iter1-1, code-simplifier-iter1-2)
31+
32+
#pragma once
33+
34+
#include <cstddef>
35+
36+
namespace httpserver {
37+
namespace detail {
38+
39+
// Map a single hex character to its integer value [0, 15].
40+
// Returns -1 for any character that is not a valid hex digit.
41+
//
42+
// constexpr so callers in constant-expression contexts can fold the
43+
// table at compile time; trivially inlined at -O1 and above.
44+
[[nodiscard]] constexpr int hex_digit_value(char c) noexcept {
45+
if (c >= '0' && c <= '9') return c - '0';
46+
if (c >= 'a' && c <= 'f') return c - 'a' + 10;
47+
if (c >= 'A' && c <= 'F') return c - 'A' + 10;
48+
return -1;
49+
}
50+
51+
// Percent-decode and '+'-to-space unescape a raw byte buffer in-place.
52+
//
53+
// Reads from [data, data+size), writes the decoded bytes back into the
54+
// same buffer starting at data[0], and returns the new (decoded) length.
55+
// The output length is always <= the input length because %HH sequences
56+
// (3 bytes) decode to 1 byte and '+' stays 1 byte.
57+
//
58+
// The loop stops at a '\0' byte or when rpos reaches size, whichever
59+
// comes first, so embedded null bytes in the input are treated as
60+
// terminators (matching the behaviour of the original http_unescape).
61+
//
62+
// The caller is responsible for adding any terminating '\0' after the
63+
// returned length if the buffer must be C-string-compatible.
64+
inline std::size_t unescape_buf_raw(char* data, std::size_t size) noexcept {
65+
if (size == 0) return 0;
66+
std::size_t rpos = 0;
67+
std::size_t wpos = 0;
68+
while (rpos < size && data[rpos] != '\0') {
69+
switch (data[rpos]) {
70+
case '+':
71+
data[wpos++] = ' ';
72+
++rpos;
73+
break;
74+
case '%':
75+
if (size > rpos + 2) {
76+
const int hi = hex_digit_value(data[rpos + 1]);
77+
const int lo = hex_digit_value(data[rpos + 2]);
78+
if (hi >= 0 && lo >= 0) {
79+
data[wpos++] =
80+
static_cast<char>((hi << 4) | lo);
81+
rpos += 3;
82+
break;
83+
}
84+
}
85+
// intentional fall through!
86+
default:
87+
data[wpos++] = data[rpos++];
88+
}
89+
}
90+
return wpos;
91+
}
92+
93+
} // namespace detail
94+
} // namespace httpserver

0 commit comments

Comments
 (0)