<fix>[core]: cap client keep-alive below agent socket timeout#4393
<fix>[core]: cap client keep-alive below agent socket timeout#4393MatheMatrix wants to merge 1 commit into
Conversation
|
Warning Review limit reachedYou’ve reached a temporary PR review limit under our Fair Usage Limits Policy. Next review available in: 5 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
f03d31a to
8768bd8
Compare
|
Comment from moyu: 感谢 review,已在 🟡 Warning 1(生命周期泄漏)— 已修复
🟢 Suggestion(注释去 ticket 前缀)— 已修复
🟡 Warning 2(测试防回归)— 已补强(部分)
已本地验证: |
8768bd8 to
633cd3f
Compare
MN's async RESTFacade HTTP client (Apache HttpAsyncClient) never set a keep-alive strategy, so idle connections in the pool live forever: the default strategy returns -1 when the agent sends no Keep-Alive header. The agent cherrypy HTTP server closes idle keep-alive connections at its socket_timeout (10s default, 15s after the agent-side fix). When the pool reuses a connection the agent has already closed, the reused request hits a RST and surfaces as "Connection reset by peer" wrapped in error 1015 "Cannot make a HTTP request", causing intermittent VM/host operation failures. Cap the client keep-alive duration via a new global property RESTFacade.keepAliveTimeMillis (default 5000ms, below the agent socket_timeout) and proactively evict expired/idle connections, so the client always recycles an idle connection before the agent closes it. GlobalPropertyImpact: adds RESTFacade.keepAliveTimeMillis (default 5000) Resolves: TIC-5970 Change-Id: I7a656a6b776876726667696c657a7669657a6664
633cd3f to
22540b8
Compare
问题 (TIC-5970)
MN 通过
RESTFacadeImpl的异步 HTTP 客户端(Apache HttpAsyncClient)向 agent 发命令时,从未设置 keep-alive 策略,连接池里的空闲连接寿命是"无限"(DefaultConnectionKeepAliveStrategy在 server 不回Keep-Alive头时返回 -1)。而 agent 的 cherrypy HTTP server 会在其
socket_timeout(默认 10s,agent 侧 fix 后 15s)关闭空闲 keep-alive 连接。当连接池复用一条 agent 已经关闭的连接时,复用请求撞上 RST,表现为Connection reset by peer,被包进错误码 1015Cannot make a HTTP request,导致云主机/物理机操作偶发失败。根因:客户端 keep-alive 空闲时间 > 服务端 keep-alive 空闲时间,客户端成了"被动关闭方",会复用到服务端已关的连接。
修复
给
createAsyncRestTemplate的HttpAsyncClients补上 keep-alive 策略,把客户端连接的可复用寿命封顶到小于 agent 的 socket_timeout(新增全局属性RESTFacade.keepAliveTimeMillis,默认 5000ms),并加后台 evictor 主动清理过期/空闲连接。这样客户端总是先于 agent 回收空闲连接,永远用新鲜连接,从机制上消除复用死连接导致的 RST。core/.../rest/RESTFacadeImpl.java:setKeepAliveStrategy+ 定期closeExpiredConnections/closeIdleConnectionscore/.../CoreGlobalProperty.java: 新增RESTFacade.keepAliveTimeMillis(默认 5000)验证
clean installBUILD SUCCESS)RestFacadeCase通过(Tests run: 1, Failures: 0, Errors: 0),覆盖asyncJsonPost失败记录路径GlobalPropertyImpact: adds RESTFacade.keepAliveTimeMillis (default 5000)
Resolves: TIC-5970
sync from gitlab !10353