Add: Server-side TLS handshake milestones and timing fixes#12860
Add: Server-side TLS handshake milestones and timing fixes#12860bryancall wants to merge 7 commits intoapache:masterfrom
Conversation
Add TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END to capture origin-side TLS handshake timing, complementing the existing client-side TS_MILESTONE_TLS_HANDSHAKE_START/END milestones. Changes: - Add new milestone enum values in apidefs.h.in (ABI compatible - appended before TS_MILESTONE_LAST_ENTRY) - Capture server TLS timing in HttpSM::state_http_server_open() for both the direct (VC_EVENT_WRITE_COMPLETE) and multiplexed (CONNECT_EVENT_TXN) connection paths - Fix missing _record_tls_handshake_end_time() call in SSLNetVConnection::sslClientHandShakeEvent() - outbound TLS handshakes were recording start time but never end time - Rename slow log field "tls_handshake" to "ua_tls_handshake" for clarity and add "server_tls_handshake" field - Update LogField.cc milestone mappings for custom log fields - Add Lua plugin milestone constants - Update API documentation and slow_log_report.pl
Previously, difference_msec() only checked if ms_end was 0 (unset) before returning the "missing" sentinel value. If ms_start was unset (0) but ms_end was set, the subtraction would produce a garbage value equal to the raw nanosecond timestamp of ms_end. This affects any milestone difference calculation where the start milestone may not be set, such as server-side milestones on cache hits where no origin connection was made. The fix adds a check for ms_start == 0 alongside the existing ms_end check. This is safe because ink_hrtime values are nanoseconds from a reference point and are never legitimately 0. The difference_sec() function inherits this fix since it delegates to difference_msec().
Change server_tls_handshake from a duration (START to END) to an offset from SM_START (SM_START to END), consistent with all other server-side fields in the slow log (server_connect, server_connect_end, server_first_read, etc.). The TLS duration can be inferred from server_tls_handshake - server_connect_end.
Clarify that CONNECT_EVENT_TXN is the multiplexed path (e.g. HTTP/2) where ConnectingEntry creates the session, and VC_EVENT_WRITE_COMPLETE is the direct path (e.g. HTTP/1) where HttpSM owns the netvc. Both paths need to capture server TLS handshake timing.
There was a problem hiding this comment.
Pull request overview
This pull request adds server-side TLS handshake milestones to Apache Traffic Server, enabling measurement of origin TLS handshake overhead. The PR addresses two main issues: (1) missing visibility into server-side TLS handshake timing, and (2) bugs in milestone timing capture and calculation.
Changes:
- Adds
TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/ENDenum values and captures server TLS timing in both direct and multiplexed connection paths - Fixes missing TLS handshake end time recording for outbound connections in
SSLNetVConnection::sslClientHandShakeEvent() - Fixes
difference_msec()to check both start and end milestones for 0 before computing difference - Renames slow log field
tls_handshaketoua_tls_handshakefor clarity and addsserver_tls_handshakefield
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| include/ts/apidefs.h.in | Adds TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/END enum values before TS_MILESTONE_LAST_ENTRY (ABI-compatible) |
| include/proxy/Milestones.h | Fixes difference_msec() to check both start and end for zero before subtraction |
| src/iocore/net/SSLNetVConnection.cc | Adds missing _record_tls_handshake_end_time() call for outbound TLS connections |
| src/proxy/http/HttpSM.cc | Captures server TLS milestones in both connection paths (direct and multiplexed); renames slow log field to ua_tls_handshake and adds server_tls_handshake |
| src/proxy/logging/LogField.cc | Adds milestone string mappings for new server TLS milestones with TODO comments for ATS 11 rename |
| plugins/lua/ts_lua_http_milestone.cc | Adds Lua plugin constants for server TLS handshake milestones |
| doc/developer-guide/api/functions/TSHttpTxnMilestoneGet.en.rst | Updates API documentation to clarify client vs. server TLS handshake milestones |
| doc/admin-guide/plugins/lua.en.rst | Documents new Lua milestone constants |
| tools/slow_log_report.pl | Updates regex to parse renamed ua_tls_handshake field and new server_tls_handshake field |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The tls_handshake field was renamed to ua_tls_handshake but the report script needs to handle both formats so it can parse logs from before and after the rename.
The slow log was printing the post-remap (origin-facing) URL which is less useful for debugging since it does not show what the client actually requested. Use t_state.unmapped_url (the pre-remap URL) instead, with a fallback to client_request URL if unmapped_url is not available.
b79b159 to
2f0b819
Compare
cmcfarlen
left a comment
There was a problem hiding this comment.
I see the milestones are set in two places, but it's not clear why or which is correct. Can you clarify this for me?
| // Record TLS handshake end time for outbound connections | ||
| if (this->get_tls_handshake_begin_time()) { | ||
| this->_record_tls_handshake_end_time(); | ||
| } |
There was a problem hiding this comment.
Why was this added? This function seems to deal with the client side, but the comment mentions outbound connections. And, if this were the outbound side, are we expecting the metrics this function updates to include both sides TLS?
There was a problem hiding this comment.
The naming in this file is from the TLS role perspective, not the ATS role. sslClientHandShakeEvent() handles the outbound side where ATS acts as the TLS client connecting to the origin. sslServerHandShakeEvent() handles the inbound side where ATS acts as the TLS server accepting client connections.
The inbound side already records the end time (line 1383), but the outbound side was missing it — this is a bug fix. Without this, get_tls_handshake_end_time() returns zero when HttpSM tries to copy the timestamps into milestones.
I've updated the comment to clarify the naming convention and the two-stage pattern.
| if (auto *netvc = session->get_netvc()) { | ||
| if (auto tbs = netvc->get_service<TLSBasicSupport>()) { | ||
| milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_START] = tbs->get_tls_handshake_begin_time(); | ||
| milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_END] = tbs->get_tls_handshake_end_time(); |
There was a problem hiding this comment.
This sets both milestones after the end time is known. Should we set the start milestone when it starts vs here? As it is, if this code isn't reached, the start milestone won't be set.
There was a problem hiding this comment.
The HttpSM doesn't have access to the netvc during the handshake in the multiplexed path — ConnectingEntry owns the connection and dispatches CONNECT_EVENT_TXN after the handshake completes. So this is the earliest point the SM can grab the timestamps.
The actual start/end times are captured in real-time at the netvc layer (sslClientHandShakeEvent() records the end time, and the begin time is set when the handshake starts). We're just copying them into milestones here. If this code isn't reached (e.g. connection failure), the milestones stay at zero, which is correct — difference_msec() checks for zero before computing the difference.
Updated the comments to make this clearer.
| if (auto tbs = _netvc->get_service<TLSBasicSupport>()) { | ||
| milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_START] = tbs->get_tls_handshake_begin_time(); | ||
| milestones[TS_MILESTONE_SERVER_TLS_HANDSHAKE_END] = tbs->get_tls_handshake_end_time(); | ||
| } |
There was a problem hiding this comment.
These aren't duplicates — they're two mutually exclusive connection paths. CONNECT_EVENT_TXN fires for multiplexed connections (H2/H3) where ConnectingEntry manages the connection. VC_EVENT_WRITE_COMPLETE fires for direct connections (H1) where the SM owns the netvc. A given connection goes through one path or the other, never both.
Updated both comments to cross-reference each other and explicitly state they're mutually exclusive.
Clarify the confusing sslClientHandShakeEvent/sslServerHandShakeEvent naming convention (TLS role vs ATS role), explain why milestones are set in two mutually exclusive connection paths (H2 multiplexed vs H1 direct), and document the two-stage timestamp pattern where the netvc records times in real-time and HttpSM copies them into milestones.
|
[approve ci autest 1] |
Description
Add
TS_MILESTONE_SERVER_TLS_HANDSHAKE_STARTandTS_MILESTONE_SERVER_TLS_HANDSHAKE_ENDmilestones to capture origin-side TLS handshake timing. Previously, only client-side TLS timing was available viaTS_MILESTONE_TLS_HANDSHAKE_START/END, making it impossible to measure origin TLS handshake overhead in the slow log or via the milestone API.Changes
New server-side TLS milestones (commit 1)
TS_MILESTONE_SERVER_TLS_HANDSHAKE_START/ENDenum values inapidefs.h.in(ABI compatible — appended beforeTS_MILESTONE_LAST_ENTRY)HttpSM::state_http_server_open()for both the direct (VC_EVENT_WRITE_COMPLETE) and multiplexed (CONNECT_EVENT_TXN) connection pathsSSLNetVConnection::sslClientHandShakeEvent()was recording TLS handshake start time for outbound connections but never calling_record_tls_handshake_end_time()— outbound TLS end time was always 0tls_handshake→ua_tls_handshakefor clarity, add newserver_tls_handshakefield in logical position afterserver_connect_endLogField.ccmilestone-to-string mappings for custom log fieldsTS_LUA_MILESTONE_SERVER_TLS_HANDSHAKE_START/END)TSHttpTxnMilestoneGetAPI documentationslow_log_report.plto parse new field namesFix
difference_msec()milestone check (commit 2)difference_msec()only checked ifms_end == 0(unset) before returning the missing sentinel. Ifms_startwas unset butms_endwas set, the subtraction would produce a garbage value. Now checks bothms_startandms_endfor 0.Use SM_START as base for server_tls_handshake (commit 3)
server_tls_handshakefrom a duration to an offset fromSM_START, consistent with all other server-side fields in the slow log timeline.Add comments explaining connection paths (commit 4)
CONNECT_EVENT_TXNis the multiplexed path (HTTP/2) andVC_EVENT_WRITE_COMPLETEis the direct path (HTTP/1).Make slow_log_report.pl backward compatible (commit 5)
tls_handshake→ua_tls_handshakerename broke parsing of old-format logs. The report script now accepts both field names.Use pre-remap URL in slow log (commit 6)
t_state.unmapped_url(the pre-remap URL) so the log shows the URL the client actually requested.Behavior Notes
-0.001(the "missing" sentinel), consistent with other unset milestones.tls_handshake→ua_tls_handshakerename is a slow log text change only — it does not affect ABI.TS_MILESTONE_TLS_HANDSHAKE_START/ENDenum values and behavior are unchanged.Testing
slow_log_report.plcorrectly parses and reports bothua_tls_handshakeandserver_tls_handshake