Skip to content

feat: retry on 429#2244

Open
jjhwan-h wants to merge 9 commits intoprojectdiscovery:devfrom
jjhwan-h:feat/retry-on-429
Open

feat: retry on 429#2244
jjhwan-h wants to merge 9 commits intoprojectdiscovery:devfrom
jjhwan-h:feat/retry-on-429

Conversation

@jjhwan-h
Copy link
Contributor

@jjhwan-h jjhwan-h commented Aug 21, 2025

Summary

This PR introduces a round-robin retry mechanism for HTTP 429 (Too Many Requests) responses.
Users can control the number of retries and the delay between them using two new flags:

--retry-rounds

--retry-delay

Changes

Options

Added RetryRounds (int) → number of retries for 429 responses

Added RetryDelay (int) → delay between retries (in ms)

Added validation to ensure retries and delay values are valid

Runner

Integrated a new retryLoop with drainedCh synchronization

Introduced retryJob struct for retry queue management

On 429 responses, jobs are pushed into the retryCh for delayed execution

Used atomic.Int64 to track remaining retries and close drainedCh when done

Tests

Added TestRunner_Process_And_RetryLoop

srv1: Always returns 429 → retries exhausted, expected 3×429, 0×200

srv2: Returns 2×429 then 200 → expected 2×429, 1×200

Verified drain and retry logic

Example

httpx -l urls.txt --retry-rounds 2 --retry-delay 500

A request returning 429 will be retried up to 2 times

Each retry happens after 500ms delay

If retries are exhausted, the request is considered failed

Related Issue

#1078

Summary by CodeRabbit

  • New Features

    • Added HTTP 429 retry handling with configurable CLI flags: --retry-rounds, --retry-delay, and --retry-timeout.
  • Documentation

    • Updated README with guidance on retry flags and Retry-After header behavior.
  • Tests

    • Added comprehensive test coverage for retry scenarios, including Retry-After header handling and timeout validation.

@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Introduces HTTP 429 retry infrastructure via three new CLI options (retry-rounds, retry-delay, retry-timeout) and implements in-process retry queue mechanics with Retry-After header support, request timeout boundaries, and integration into the enumeration processing pipeline.

Changes

Cohort / File(s) Summary
Configuration & Options
runner/options.go
Adds RetryRounds, RetryDelay, and RetryTimeout fields to Options struct; wires CLI flags and validation logic to enforce that RetryRounds > 0 requires RetryDelay, RetryTimeout > 0.
Core Retry Implementation
runner/runner.go
Introduces retryQueue and retryItem types for queueing failed requests; updates Process/process signatures to accept and propagate retryQueue; implements processRetries method with delay calculation from Retry-After headers; integrates retry logic into RunEnumeration flow.
Testing
runner/runner_test.go, cmd/integration-test/http.go
Adds comprehensive tests covering retry mechanics: per-server 429 handling, Retry-After header parsing, retry timeout enforcement, and fallback delay behavior; uses atomic counters and test servers to validate retry counts and output.
Documentation
README.md
Documents three new CLI flags under OPTIMIZATIONS section; explains Retry-After interaction, fallback delay semantics, and timeout constraints.

Sequence Diagram

sequenceDiagram
    participant Client as Client Request
    participant Runner as Runner.Process
    participant RQ as retryQueue
    participant Server as HTTP Server
    participant ResultChan as output chan

    Client->>Runner: Process(target, rq)
    Runner->>Server: HTTP Request
    alt Success (200)
        Server-->>Runner: 200 Response
        Runner->>ResultChan: Emit Result
    else Rate Limited (429)
        Server-->>Runner: 429 Response
        Runner->>RQ: push(retryItem)
        RQ->>RQ: queue item
    end
    
    Runner->>RQ: drain() after initial work
    loop For each queued retry
        RQ-->>Runner: retryItem
        Runner->>Runner: retryDelay(calcWait)
        Runner->>Runner: sleep(delay)
        Runner->>Server: Retry HTTP Request
        alt Success (200)
            Server-->>Runner: 200 Response
            Runner->>ResultChan: Emit Result
        else Still 429 & retries remain
            Server-->>Runner: 429 Response
            Runner->>RQ: push(retryItem)
        else Timeout exceeded
            Runner->>ResultChan: Emit final Result
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A retry queue hops through the 429s,
With Retry-After headers as gentle pleas,
Through timeout bounds and delay-based throes,
Each failed request gets another go,
Until success blooms or patience grows thin,
And the rabbit's retry dance finds where 200s begin! 🌟

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: retry on 429' directly and clearly describes the main change: implementing retry functionality for HTTP 429 (Too Many Requests) responses, which is the core purpose of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
runner/runner.go (1)

1354-1355: Consider documenting the retry attempt tracking logic.

The condition if job.attempt == 1 adds to the remaining counter only for first attempts. While this is correct for tracking initial jobs vs retries, it might benefit from a clarifying comment.

Add a comment to clarify the tracking logic:

+				// Track only initial attempts (attempt=1) to know when all original jobs are done
 				if job.attempt == 1 {
 					remaining.Add(1)
 				}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 5f55bfa and 9936878.

📒 Files selected for processing (4)
  • runner/options.go (3 hunks)
  • runner/runner.go (9 hunks)
  • runner/runner_test.go (3 hunks)
  • runner/types.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
runner/options.go (2)
runner/runner.go (1)
  • New (114-389)
common/httpx/httpx.go (1)
  • New (46-213)
runner/types.go (3)
common/httpx/httpx.go (1)
  • HTTPX (33-43)
common/httpx/types.go (1)
  • Target (4-8)
runner/options.go (1)
  • ScanOptions (53-110)
runner/runner_test.go (4)
runner/runner.go (1)
  • New (114-389)
runner/options.go (1)
  • Options (173-353)
runner/types.go (1)
  • Result (35-103)
common/httpx/http2.go (1)
  • HTTP (15-15)
runner/runner.go (4)
common/httpx/httpx.go (1)
  • HTTPX (33-43)
common/httpx/types.go (1)
  • Target (4-8)
runner/options.go (1)
  • ScanOptions (53-110)
runner/types.go (1)
  • Result (35-103)
🔇 Additional comments (11)
runner/types.go (1)

123-132: LGTM! Well-structured retry state management type.

The retryJob struct appropriately captures all necessary state for implementing HTTP 429 retry logic. The fields are properly organized and the use of time.Time for scheduling is appropriate.

runner/options.go (2)

277-278: Good addition of retry configuration fields.

The new RetryRounds and RetryDelay fields are appropriately typed and well-positioned within the Options struct alongside other rate-limiting related configurations.


535-536: Clear and well-documented CLI flags for retry configuration.

The flags are properly grouped under "Optimizations" with clear descriptions. The default delay of 500ms is reasonable for HTTP 429 retry scenarios.

runner/runner_test.go (1)

322-409: Well-designed test for concurrent retry behavior.

The test thoroughly validates the retry mechanism with proper concurrent execution handling and accurate result counting. The test setup with two servers having different 429 response patterns effectively covers the retry logic edge cases.

runner/runner.go (7)

1261-1263: Good channel and goroutine management for retry mechanism.

The retry channel creation and retry loop initialization are properly placed before the main processing begins.


1306-1308: Proper synchronization with drain channel.

The conditional wait on the drain channel when RetryRounds > 0 ensures all retry attempts complete before closing the output channel. This prevents premature termination.


1333-1393: Well-implemented retry loop with proper cancellation and draining.

The retry loop implementation is robust with:

  • Proper context cancellation support
  • Atomic counter for tracking in-flight jobs
  • Timer-based scheduling with cleanup
  • Correct drain signaling when all work completes

1481-1493: Correct retry job scheduling for 429 responses.

The retry job creation properly captures all necessary state and schedules with the configured delay. The check for RetryRounds > 0 ensures retries only happen when configured.


1499-1514: Good propagation of retry channel through TLS/CSP probe paths.

The retry channel is correctly passed through all recursive r.process calls in both TLS and CSP probing paths, ensuring consistent retry behavior across all request types.


1548-1560: Consistent retry handling for custom port requests.

The retry logic for custom port requests mirrors the default port handling, maintaining consistency across different request paths.


1566-1570: Proper retry channel propagation in custom port TLS probing.

The retry channel is correctly passed through the TLS probe paths for custom ports.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
runner/runner_test.go (4)

325-344: Make the server conditions more explicit (< vs !=) for readability

Using “not equal to N” works, but “less than N” communicates intent (first N-1 calls) more clearly and avoids edge-case surprises if the counter increments beyond the sentinel.

Apply this diff to clarify:

-    if atomic.AddInt32(&hits1, 1) != 4 {
+    if atomic.AddInt32(&hits1, 1) < 4 {
         w.WriteHeader(http.StatusTooManyRequests)
         return
     }
@@
-    if atomic.AddInt32(&hits2, 1) != 3 {
+    if atomic.AddInt32(&hits2, 1) < 3 {
         w.WriteHeader(http.StatusTooManyRequests)
         return
     }

345-351: Tiny nit: document RetryDelay units to avoid future confusion

Options.RetryDelay is treated as milliseconds by the new retry loop. Consider a short comment to prevent accidental second-based assumptions during future edits.

Example:

 r, err := New(&Options{
     Threads:     1,
     RetryRounds: 2,
-    RetryDelay:  5,
+    RetryDelay:  5, // milliseconds
     Timeout:     3,
 })

373-389: Compare by prefix rather than strict equality to make URL matching robust

Depending on how Result.URL is populated, it may include a trailing slash or path. Using strict equality with srvX.URL risks brittle matches. HasPrefix is sufficient here and resilient to minor formatting differences.

Apply this diff:

-            switch res.StatusCode {
+            switch res.StatusCode {
             case http.StatusTooManyRequests:
-                if res.URL == srv1.URL {
+                if strings.HasPrefix(res.URL, srv1.URL) {
                     s1n429++
                 } else {
                     s2n429++
                 }
             case http.StatusOK:
-                if res.URL == srv1.URL {
+                if strings.HasPrefix(res.URL, srv1.URL) {
                     s1n200++
                 } else {
                     s2n200++
                 }
             }

401-409: Strengthen assertions with server hit counts

In addition to output-based assertions, verify the exact number of HTTP calls made to each server. This catches cases where the output channel behavior changes but retries still occur (or vice versa).

Apply this diff:

     // srv2: should have 2x 429 responses and 1x 200 (succeeds on 3rd attempt)
     require.Equal(t, 2, s2n429)
     require.Equal(t, 1, s2n200)
+
+    // Also assert exact server invocations:
+    // srv1: 1 initial + 2 retries = 3 total
+    require.EqualValues(t, 3, atomic.LoadInt32(&hits1))
+    // srv2: 1 initial + 2 retries = 3 total, ends with 200 on 3rd
+    require.EqualValues(t, 3, atomic.LoadInt32(&hits2))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 9936878 and 7fe6a48.

📒 Files selected for processing (2)
  • runner/options.go (3 hunks)
  • runner/runner_test.go (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • runner/options.go
🧰 Additional context used
🧬 Code graph analysis (1)
runner/runner_test.go (3)
runner/runner.go (1)
  • New (114-389)
runner/options.go (1)
  • Options (173-353)
runner/types.go (1)
  • Result (35-103)
🔇 Additional comments (2)
runner/runner_test.go (2)

4-11: Imports look correct and minimal

The newly added imports are necessary for the test logic (servers, status codes, atomics, sync utils). No concerns.

Also applies to: 19-19


236-239: Formatting-only struct field alignment

This change is cosmetic and does not affect behavior. Safe to keep.

Comment on lines +356 to +357
_, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Don’t ignore the retry loop’s stop handle; avoid a goroutine leak

retryLoop appears to return a stop/cleanup handle (you currently discard it with _). If the loop isn’t explicitly stopped, the goroutine can remain blocked on retryCh after the test finishes, causing leaks and potential flakiness in suite runs.

Apply this diff to wire cancellation correctly and ensure cleanup even on early test failures:

-    _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
+    ctx, cancel := context.WithCancel(context.Background())
+    stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
+    t.Cleanup(func() {
+        // Ensure the loop exits and resources are released
+        stopRetry()
+        cancel()
+    })

If you want to double-check the signature and the semantics of the first return value:


🏁 Script executed:

#!/bin/bash
# Inspect retryLoop signature/returns to confirm cleanup expectations
rg -nP -C2 'func\s+\(r\s+\*Runner\)\s+retryLoop\(' runner

Length of output: 342


🏁 Script executed:

#!/bin/bash
# Retrieve the full signature (including return values) of retryLoop to confirm the first return value’s type and semantics
rg -nP -C10 'func\s+\(r\s+\*Runner\)\s+retryLoop\(' runner/runner.go

Length of output: 691


🏁 Script executed:

#!/bin/bash
# Extract the full body of retryLoop to confirm how stop() and drainedCh operate
sed -n '1333,1400p' runner/runner.go

Length of output: 1518


🏁 Script executed:

#!/bin/bash
# Locate usages of retryLoop in the test file
rg -n -C5 'retryLoop' runner/runner_test.go

# Show full context around lines 350–380 to inspect how stop and drainedCh are handled
sed -n '340,380p' runner/runner_test.go

Length of output: 1276


Wire up the retryLoop stop handle to prevent goroutine leaks

The test currently invokes retryLoop with a background context and discards its first return value (stop func()), so the internal goroutine never sees a cancellation signal. This can leave the loop blocked on retryCh after the test ends, leading to leaked goroutines and flakiness.

• File: runner/runner_test.go
• Lines: 356–357

Suggested change:

-    _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
+    // Create a cancellable context and capture the stop function
+    ctx, cancel := context.WithCancel(context.Background())
+    stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
+    // Ensure the loop exits and the context is cleaned up after the test
+    t.Cleanup(func() {
+        stopRetry()
+        cancel()
+    })
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
_, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
// Create a cancellable context and capture the stop function
ctx, cancel := context.WithCancel(context.Background())
stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
// Ensure the loop exits and the context is cleaned up after the test
t.Cleanup(func() {
stopRetry()
cancel()
})
🤖 Prompt for AI Agents
In runner/runner_test.go around lines 356–357, the test calls
r.retryLoop(context.Background(), ...) but ignores the returned stop func,
leaving the internal goroutine unable to observe cancellation; capture the first
return value (stop func) when calling retryLoop and ensure you call it
(preferably via defer stop() immediately after the call) so the retryLoop is
signaled to exit and no goroutine is leaked.

Comment on lines +396 to +399
wg.Wait()
<-drainedCh
close(output)
drainWG.Wait()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard against hangs: time-bound the wait for drainedCh

If the retry loop logic regresses, <-drainedCh can block indefinitely and hang the test suite. Use a bounded wait to fail fast and surface the defect.

Apply this diff:

-    wg.Wait()
-    <-drainedCh
-    close(output)
+    wg.Wait()
+    select {
+    case <-drainedCh:
+        // drained successfully
+    case <-time.After(5 * time.Second):
+        t.Fatalf("retry loop did not drain within timeout")
+    }
+    close(output)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
wg.Wait()
<-drainedCh
close(output)
drainWG.Wait()
wg.Wait()
select {
case <-drainedCh:
// drained successfully
case <-time.After(5 * time.Second):
t.Fatalf("retry loop did not drain within timeout")
}
close(output)
drainWG.Wait()

@Mzack9999 Mzack9999 self-requested a review August 27, 2025 18:02
@Mzack9999
Copy link
Member

Nice one! I was thinking about something similar with automatic detection of rate limit from RL headers combined with https://github.com/projectdiscovery/ratelimit/tree/feat-per-key-limit to make the tool compliant with them without slowness.
Just curious, were you testing many different paths towards the same target in order to hit 429? Generally httpx use case is to send few requests to a very large number of targets.

@jjhwan-h
Copy link
Contributor Author

jjhwan-h commented Aug 28, 2025

Thanks for the feedback!

In my testing I was indeed sending many different paths to the same target, which resulted in 429 responses. However, in real-world environments with CDNs and WAFs, 429s can also occur sporadically across multiple targets, even with only a few requests per host.

The --retry-rounds flag is therefore intended as an optional safeguard to preserve result accuracy in such cases. It does not alter the default behavior and can be enabled only when needed.

if you feel additional tests or adjustments to the implementation would make this more useful or easier to maintain, I’d be happy to revise the PR accordingly!

@dogancanbakir dogancanbakir linked an issue Aug 28, 2025 that may be closed by this pull request
@jjhwan-h jjhwan-h changed the title Feat/retry on 429 feat: retry on 429 Oct 9, 2025
@neo-by-projectdiscovery-dev
Copy link

neo-by-projectdiscovery-dev bot commented Mar 7, 2026

Neo - PR Security Review

Medium: 1

Highlights

  • Refactored retry mechanism from channel-based to simpler queue-based implementation
  • Introduced retryDelay() function that parses server-controlled Retry-After headers
  • Added processRetries() function with context timeout protection for retry loops
Medium (1)
  • Unbounded Retry-After header enables sleep amplification DoSrunner/runner.go:1605
    The retryDelay function parses the Retry-After header from server responses without validating upper bounds. An attacker-controlled server can send extremely large values (e.g., 'Retry-After: 999999999' or a date far in the future) causing httpx to create timers with multi-year durations.
Security Impact

Unbounded Retry-After header enables sleep amplification DoS (runner/runner.go:1605):
A malicious server can force httpx to sleep for extended periods (up to RetryTimeout) on each retry attempt, wasting the user's time and resources. With multiple retry rounds and URLs, this amplifies into significant scanning delays (e.g., 100 URLs × 3 retries × 30s = 1.5 hours of forced waiting).

Attack Examples

Unbounded Retry-After header enables sleep amplification DoS (runner/runner.go:1605):

Attacker's server returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 999999999

The retryDelay function parses this as 999999999 seconds without bounds checking, creating a timer that would fire in ~31 years. While RetryTimeout caps actual sleep duration, this still enables maximum-duration sleeps on every retry.
Suggested Fixes

Unbounded Retry-After header enables sleep amplification DoS (runner/runner.go:1605):

Add maximum bounds check in retryDelay function: cap parsed Retry-After values to a reasonable maximum (e.g., 300 seconds). Change lines 1608-1609 to: 'if seconds, err := strconv.Atoi(ra[0]); err == nil && seconds > 0 && seconds <= 300 { return time.Duration(seconds) * time.Second }' and similarly cap time.Until(t) results at line 1612-1613.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1605-1619, the retryDelay function parses
the Retry-After header without upper bounds validation; add a constant 'const
maxRetryAfterSeconds = 300' at the top of the file, then at line 1608 change the
condition to 'if seconds, err := strconv.Atoi(ra[0]); err == nil && seconds > 0
&& seconds <= maxRetryAfterSeconds' and at line 1612 add a bounds check: 'if d
:= time.Until(t); d > 0 && d <= maxRetryAfterSeconds*time.Second { return d }'
to prevent sleep amplification attacks from malicious servers.
Hardening Notes
  • Consider adding metrics/logging when Retry-After values exceed expected thresholds (e.g., > 60s) to help users detect hostile servers
  • Add a --max-retry-delay flag to let users configure their own upper bound for Retry-After values rather than hardcoding it

Comment @pdneo help for available commands. · Open in Neo

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
runner/runner_test.go (1)

423-510: Consider adding a test case for the no-429 scenario.

The current test validates retry behavior when 429 responses occur. However, it doesn't cover the scenario where RetryRounds > 0 but no 429 responses are received. This scenario would expose a potential hang (as noted in the retryLoop review).

Adding such a test would help catch the bug where drainedCh is never closed:

func TestRunner_RetryLoop_NoRetries(t *testing.T) {
    // Server that always returns 200
    srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        w.WriteHeader(http.StatusOK)
    }))
    defer srv.Close()

    r, err := New(&Options{
        Threads:     1,
        RetryRounds: 2,
        RetryDelay:  5,
        Timeout:     3,
    })
    require.NoError(t, err)

    // This test should complete quickly, not hang
    // Add timeout to catch the hang bug
    done := make(chan struct{})
    go func() {
        // Run enumeration or process logic here
        close(done)
    }()

    select {
    case <-done:
        // Success
    case <-time.After(5 * time.Second):
        t.Fatal("test hung - drainedCh was never closed")
    }
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner_test.go` around lines 423 - 510, Add a new test (e.g.,
TestRunner_RetryLoop_NoRetries) that sets up an httptest server which always
returns 200, creates a Runner via New with RetryRounds > 0, wires retryCh and
output into r.retryLoop to obtain drainedCh, calls r.process for the server URL,
and then waits for drainedCh to be closed using a short timeout (select on
drainedCh vs time.After) to fail the test if it hangs; also assert that output
contains only 200 responses and no 429s. This will exercise retryLoop and
process (symbols: New, retryLoop, process, drainedCh, retryCh, output) and catch
the case where drainedCh is never closed when no 429s occur.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@runner/runner.go`:
- Around line 1602-1662: The retryLoop can never close drainedCh if no 429
retries are ever enqueued; add a one-shot signal (e.g. return an extra
signalDone func() from retryLoop) that RunEnumeration will call after wg.Wait(),
and inside retryLoop listen for that signal in addition to ctx.Done()/retryCh;
when the signal is received check remaining.Load() and if it is 0 close
drainedCh immediately (otherwise keep existing logic to close drainedCh when
remaining.Add(-1) == 0); ensure the signal only causes a single close and that
retryLoop continues to accept retryCh items until ctx is canceled so existing
retry logic (retryCh, remaining, j.attempt) remains unchanged.
- Around line 1730-1732: The exported method Runner.Process currently accepts an
unexported type retryJob which prevents callers in other packages from using it;
either make the method internal by renaming Process to process (call the
existing r.process wrapper accordingly) or export the job type by renaming
retryJob to RetryJob and updating all references (including the parameter name
in Runner.Process and any functions/structs that construct/send those jobs) so
the public API uses only exported types; choose the option that matches intended
visibility and apply the renaming consistently across Runner.Process, r.process
call sites, and any creation/usage of retryJob.

---

Nitpick comments:
In `@runner/runner_test.go`:
- Around line 423-510: Add a new test (e.g., TestRunner_RetryLoop_NoRetries)
that sets up an httptest server which always returns 200, creates a Runner via
New with RetryRounds > 0, wires retryCh and output into r.retryLoop to obtain
drainedCh, calls r.process for the server URL, and then waits for drainedCh to
be closed using a short timeout (select on drainedCh vs time.After) to fail the
test if it hangs; also assert that output contains only 200 responses and no
429s. This will exercise retryLoop and process (symbols: New, retryLoop,
process, drainedCh, retryCh, output) and catch the case where drainedCh is never
closed when no 429s occur.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 60a92a89-58bc-49d7-a340-51c996bf8c49

📥 Commits

Reviewing files that changed from the base of the PR and between 7fe6a48 and b75d4ee.

📒 Files selected for processing (4)
  • runner/options.go
  • runner/runner.go
  • runner/runner_test.go
  • runner/types.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • runner/options.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
runner/runner.go (2)

1579-1581: ⚠️ Potential issue | 🔴 Critical

Critical: drainedCh will never close if no 429 responses occur, causing an indefinite hang.

When RetryRounds > 0 but no requests return HTTP 429, no jobs are enqueued to retryCh. The atomic counter remaining stays at 0, and drainedCh is never closed because the close condition (remaining.Add(-1) == 0) is never triggered. This causes <-drainedCh to block forever.

The issue identified in a previous review remains unaddressed. A mechanism is needed to signal when initial work completes so the retry loop can close drainedCh if no retries are pending.

,

🐛 One possible fix: Signal completion after initial work
 func (r *Runner) retryLoop(
 	parent context.Context,
 	retryCh chan retryJob,
 	output chan<- Result,
 	analyze analyzeFunc,
-) (stop func(), drained <-chan struct{}) {
+) (stop func(), signalDone func(), drained <-chan struct{}) {
 	var remaining atomic.Int64
 	ctx, cancel := context.WithCancel(parent)
 	drainedCh := make(chan struct{})
+	doneCh := make(chan struct{})
+	var doneOnce sync.Once

 	go func() {
-		defer close(retryCh)
-
 		for {
 			select {
 			case <-ctx.Done():
+				close(drainedCh)
 				return
+			case <-doneCh:
+				if remaining.Load() == 0 {
+					close(drainedCh)
+					return
+				}
 			case job, ok := <-retryCh:
 				// ... existing logic
 			}
 		}
 	}()

-	return func() { cancel() }, drainedCh
+	return func() { cancel() }, func() { doneOnce.Do(func() { close(doneCh) }) }, drainedCh
 }

Then in RunEnumeration, call signalDone() after wg.Wait().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1579 - 1581, The drainedCh can hang if no 429s
occur because remaining never decrements and drainedCh is never closed; fix by
adding an explicit completion signal from the initial work to the retry loop:
introduce (or use) a signalDone function and call it after the initial worker
WaitGroup completes (wg.Wait()) in RunEnumeration so that the retry loop can
observe completion and close drainedCh when no retry jobs were enqueued; ensure
the retry loop logic that references remaining, retryCh and drainedCh treats
this completion signal as triggering the same close behavior as when remaining
reaches zero.

1754-1756: ⚠️ Potential issue | 🟡 Minor

Exported method Process uses unexported type retryJob in its signature.

The Process method is exported (public API), but the retryCh chan retryJob parameter uses the unexported retryJob type. This prevents external packages from calling this method since they cannot construct a chan retryJob.

Either:

  1. Make the method unexported (process) if it's internal-only, or
  2. Export RetryJob if external callers need this functionality

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1754 - 1756, The exported method
Runner.Process currently accepts a parameter retryCh chan retryJob where
retryJob is unexported, preventing external callers from constructing that
channel; either make the method unexported (rename Runner.Process to process and
update callers to use the internal wrapper that calls r.process) if it’s
internal-only, or export the type (rename retryJob to RetryJob and update its
definition and all usages) so external packages can create chan RetryJob; update
the signature in Runner.Process and all call sites (including the internal
r.process wrapper) accordingly to keep signatures consistent.
runner/runner_test.go (2)

458-459: ⚠️ Potential issue | 🟡 Minor

Wire up the retryLoop stop handle to prevent goroutine leaks.

The test discards the stop function returned by retryLoop, leaving the internal goroutine unable to receive a cancellation signal. This can cause goroutine leaks after the test completes.

,

🛠️ Suggested fix
-    _, drainedCh := r.retryLoop(context.Background(), retryCh, output, r.analyze)
+    ctx, cancel := context.WithCancel(context.Background())
+    stopRetry, drainedCh := r.retryLoop(ctx, retryCh, output, r.analyze)
+    t.Cleanup(func() {
+        stopRetry()
+        cancel()
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner_test.go` around lines 458 - 459, The test is discarding the
stop function returned by r.retryLoop which prevents the internal goroutine from
being signaled to stop; update the test to capture the returned stop function
from r.retryLoop (alongside drainedCh) and ensure it is invoked (e.g., defer
stop() or call stop() when done) so the goroutine is cancelled and drainedCh can
be observed without leaking goroutines.

498-501: ⚠️ Potential issue | 🟡 Minor

Guard against indefinite hangs with a time-bounded wait.

If the retry loop logic regresses, <-drainedCh can block indefinitely and hang the test suite. Use a bounded wait to fail fast.

,

🛠️ Suggested fix
 	wg.Wait()
-	<-drainedCh
+	select {
+	case <-drainedCh:
+		// drained successfully
+	case <-time.After(10 * time.Second):
+		t.Fatalf("retry loop did not drain within timeout")
+	}
 	close(output)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner_test.go` around lines 498 - 501, The test currently blocks
indefinitely on a plain receive from drainedCh (after wg.Wait()) which can hang
if the retry loop regresses; change the blocking receive to a time-bounded wait
(e.g., use a select with a timeout or context with deadline) so that if
drainedCh is not closed within the timeout the test fails fast (call t.Fatalf or
t.Error) and then still closes output and calls drainWG.Wait(); update the code
paths around drainedCh, wg.Wait(), and drainWG.Wait() to use this bounded wait
to avoid indefinite hangs.
🧹 Nitpick comments (3)
runner/options.go (1)

849-856: Prefer fmt.Errorf over errors.New(fmt.Sprintf(...)).

The idiomatic Go approach is to use fmt.Errorf directly instead of wrapping fmt.Sprintf with errors.New.

♻️ Suggested refactor
 	if options.RetryRounds > 0 {
 		if options.RetryDelay <= 0 {
-			return errors.New(fmt.Sprintf("invalid retry-delay: must be >0 when retry-rounds=%d (got %d)", options.RetryRounds, options.RetryDelay))
+			return fmt.Errorf("invalid retry-delay: must be >0 when retry-rounds=%d (got %d)", options.RetryRounds, options.RetryDelay)
 		}
 		if options.RetryTimeout <= 0 {
-			return errors.New(fmt.Sprintf("invalid retry-timeout: must be >0 when retry-rounds=%d (got %d)", options.RetryRounds, options.RetryTimeout))
+			return fmt.Errorf("invalid retry-timeout: must be >0 when retry-rounds=%d (got %d)", options.RetryRounds, options.RetryTimeout)
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/options.go` around lines 849 - 856, Replace the non-idiomatic
errors.New(fmt.Sprintf(...)) calls in the retry validation block with fmt.Errorf
to construct the formatted error messages; specifically update the checks that
validate options.RetryRounds > 0 and the subsequent checks for
options.RetryDelay and options.RetryTimeout so they return fmt.Errorf with the
same formatted text (refer to symbols options.RetryRounds, options.RetryDelay,
options.RetryTimeout) instead of errors.New(fmt.Sprintf(...)).
runner/runner.go (1)

1649-1680: Potential goroutine accumulation under sustained 429 responses.

Each job spawns a new goroutine (line 1649) that waits for its scheduled time before executing. Under sustained 429 responses with many targets, this could lead to a large number of goroutines waiting on timers simultaneously, potentially impacting memory usage.

Consider using a worker pool or semaphore to limit concurrent retry goroutines if this becomes a concern in production usage with high retry volumes.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1649 - 1680, The current pattern launches a
goroutine per retryJob that may sit on a timer (see retryJob handling, analyze
call, retryCh, remaining, drainedCh and r.options.RetryRounds), which can
accumulate under sustained 429s; change this to use a bounded worker pool or
semaphore that pulls jobs from retryCh and performs the timer wait + analyze
logic inside a fixed number of worker goroutines (or gated by a semaphore) so
at-most N timers/goroutines exist concurrently, preserve the retry logic
(increment j.attempt, compute j.when with retryDelay) and ensure
remaining.Add(-1) and closed(drainedCh) semantics remain unchanged.
runner/runner_test.go (1)

593-606: Apply the same cleanup pattern here.

Similar to TestRunner_Process_And_RetryLoop, this test should wire up the stop handle and use a time-bounded wait to prevent potential goroutine leaks and test hangs.

♻️ Suggested fix
-    _, drainedCh := rn.retryLoop(context.Background(), retryCh, output, rn.analyze)
+    ctx, cancel := context.WithCancel(context.Background())
+    stopRetry, drainedCh := rn.retryLoop(ctx, retryCh, output, rn.analyze)
+    t.Cleanup(func() {
+        stopRetry()
+        cancel()
+    })
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner_test.go` around lines 593 - 606, Replace the unbounded context
and blind waits with a cancellable/time-bounded stop flow: create a context with
timeout (e.g., ctx, cancel := context.WithTimeout(context.Background(),
time.Second)) and pass ctx into rn.retryLoop instead of context.Background();
after starting rn.process(..., retryCh, ...) call cancel (or cancel after a
bounded wait) and then wait for drainedCh with a select that times out to avoid
hanging; also replace the unconditional wg.Wait() with a time-bounded wait
(select on a done channel or use the sync util's timeout variant) so that the
goroutine wait for wg, the retryLoop drainedCh and the output close are all
guarded by timeouts. Ensure you update the calls that reference retryLoop,
process, drainedCh, wg, retryCh and output accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@runner/runner_test.go`:
- Around line 458-459: The test is discarding the stop function returned by
r.retryLoop which prevents the internal goroutine from being signaled to stop;
update the test to capture the returned stop function from r.retryLoop
(alongside drainedCh) and ensure it is invoked (e.g., defer stop() or call
stop() when done) so the goroutine is cancelled and drainedCh can be observed
without leaking goroutines.
- Around line 498-501: The test currently blocks indefinitely on a plain receive
from drainedCh (after wg.Wait()) which can hang if the retry loop regresses;
change the blocking receive to a time-bounded wait (e.g., use a select with a
timeout or context with deadline) so that if drainedCh is not closed within the
timeout the test fails fast (call t.Fatalf or t.Error) and then still closes
output and calls drainWG.Wait(); update the code paths around drainedCh,
wg.Wait(), and drainWG.Wait() to use this bounded wait to avoid indefinite
hangs.

In `@runner/runner.go`:
- Around line 1579-1581: The drainedCh can hang if no 429s occur because
remaining never decrements and drainedCh is never closed; fix by adding an
explicit completion signal from the initial work to the retry loop: introduce
(or use) a signalDone function and call it after the initial worker WaitGroup
completes (wg.Wait()) in RunEnumeration so that the retry loop can observe
completion and close drainedCh when no retry jobs were enqueued; ensure the
retry loop logic that references remaining, retryCh and drainedCh treats this
completion signal as triggering the same close behavior as when remaining
reaches zero.
- Around line 1754-1756: The exported method Runner.Process currently accepts a
parameter retryCh chan retryJob where retryJob is unexported, preventing
external callers from constructing that channel; either make the method
unexported (rename Runner.Process to process and update callers to use the
internal wrapper that calls r.process) if it’s internal-only, or export the type
(rename retryJob to RetryJob and update its definition and all usages) so
external packages can create chan RetryJob; update the signature in
Runner.Process and all call sites (including the internal r.process wrapper)
accordingly to keep signatures consistent.

---

Nitpick comments:
In `@runner/options.go`:
- Around line 849-856: Replace the non-idiomatic errors.New(fmt.Sprintf(...))
calls in the retry validation block with fmt.Errorf to construct the formatted
error messages; specifically update the checks that validate options.RetryRounds
> 0 and the subsequent checks for options.RetryDelay and options.RetryTimeout so
they return fmt.Errorf with the same formatted text (refer to symbols
options.RetryRounds, options.RetryDelay, options.RetryTimeout) instead of
errors.New(fmt.Sprintf(...)).

In `@runner/runner_test.go`:
- Around line 593-606: Replace the unbounded context and blind waits with a
cancellable/time-bounded stop flow: create a context with timeout (e.g., ctx,
cancel := context.WithTimeout(context.Background(), time.Second)) and pass ctx
into rn.retryLoop instead of context.Background(); after starting
rn.process(..., retryCh, ...) call cancel (or cancel after a bounded wait) and
then wait for drainedCh with a select that times out to avoid hanging; also
replace the unconditional wg.Wait() with a time-bounded wait (select on a done
channel or use the sync util's timeout variant) so that the goroutine wait for
wg, the retryLoop drainedCh and the output close are all guarded by timeouts.
Ensure you update the calls that reference retryLoop, process, drainedCh, wg,
retryCh and output accordingly.

In `@runner/runner.go`:
- Around line 1649-1680: The current pattern launches a goroutine per retryJob
that may sit on a timer (see retryJob handling, analyze call, retryCh,
remaining, drainedCh and r.options.RetryRounds), which can accumulate under
sustained 429s; change this to use a bounded worker pool or semaphore that pulls
jobs from retryCh and performs the timer wait + analyze logic inside a fixed
number of worker goroutines (or gated by a semaphore) so at-most N
timers/goroutines exist concurrently, preserve the retry logic (increment
j.attempt, compute j.when with retryDelay) and ensure remaining.Add(-1) and
closed(drainedCh) semantics remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bf44b34c-55f5-48b2-86e3-d918d9115213

📥 Commits

Reviewing files that changed from the base of the PR and between b75d4ee and 8d424be.

📒 Files selected for processing (5)
  • README.md
  • cmd/integration-test/http.go
  • runner/options.go
  • runner/runner.go
  • runner/runner_test.go

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
runner/runner.go (1)

1743-1745: ⚠️ Potential issue | 🟠 Major

Keep retryQueue out of the exported Process API.

This is still a breaking public-surface change: existing callers now need an extra argument, and external packages cannot construct a non-nil queue value anyway. Preserve the old Process signature and keep the retry plumbing behind the internal process helper.

Minimal API-preserving shape
-func (r *Runner) Process(t string, wg *syncutil.AdaptiveWaitGroup, protocol string, scanopts *ScanOptions, output chan Result, rq *retryQueue) {
-	r.process(t, wg, r.hp, protocol, scanopts, output, rq)
+func (r *Runner) Process(t string, wg *syncutil.AdaptiveWaitGroup, protocol string, scanopts *ScanOptions, output chan Result) {
+	r.process(t, wg, r.hp, protocol, scanopts, output, nil)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1743 - 1745, Public API change: restore the
original exported Runner.Process signature by removing the retryQueue parameter
and keep retry plumbing inside the unexported r.process helper. Change the
exported method Runner.Process(t string, wg *syncutil.AdaptiveWaitGroup,
protocol string, scanopts *ScanOptions, output chan Result) to call r.process
with the appropriate internal retryQueue (e.g., a nil or r-owned retryQueue) so
external callers do not need to construct retryQueue; keep any retryQueue type
and logic confined to the unexported process implementation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@runner/runner.go`:
- Around line 1650-1673: The retry path in processRetries currently sends
analyze() results directly to output and requeues 429s, which skips the TLS/CSP
expansion logic used in the normal process flow; extract the post-analyze
handling (the logic that inspects analyze() result, emits to output, triggers
SAN/CSP-derived target discovery and requeues on 429) into a shared helper
(e.g., handleAnalyzeResult(result Result, rq *retryQueue, output chan Result,
originalItem retryItem, opts options) or similar) and call that helper from both
process() and processRetries() instead of duplicating logic; ensure the helper
receives the original retryItem data so it can requeue with retryDelay(result,
r.options.RetryDelay) when result.StatusCode == http.StatusTooManyRequests and
also performs the TLS/CSP expansion side-effects before sending to output.
- Around line 1571-1579: The retry context is created from context.Background()
so retries outlive runner shutdown; instead derive retryCtx from the runner's
shutdown/cancellation context (e.g., r.shutdownCtx or r.ctx used by Interrupt())
and then apply the optional timeout via context.WithTimeout; update the block
that builds retryCtx (used by r.processRetries) to use that base context so
queued retries are cancelled when Interrupt() is invoked and RetryTimeout starts
relative to shutdown rather than after wg.Wait().

---

Duplicate comments:
In `@runner/runner.go`:
- Around line 1743-1745: Public API change: restore the original exported
Runner.Process signature by removing the retryQueue parameter and keep retry
plumbing inside the unexported r.process helper. Change the exported method
Runner.Process(t string, wg *syncutil.AdaptiveWaitGroup, protocol string,
scanopts *ScanOptions, output chan Result) to call r.process with the
appropriate internal retryQueue (e.g., a nil or r-owned retryQueue) so external
callers do not need to construct retryQueue; keep any retryQueue type and logic
confined to the unexported process implementation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 13c63f75-ec3e-4eb8-a62d-63c900fa59f5

📥 Commits

Reviewing files that changed from the base of the PR and between 8d424be and c4d5a29.

📒 Files selected for processing (2)
  • runner/runner.go
  • runner/runner_test.go

Comment on lines +1571 to +1579
if r.options.RetryRounds > 0 {
retryCtx := context.Background()
if r.options.RetryTimeout > 0 {
var cancel context.CancelFunc
retryCtx, cancel = context.WithTimeout(retryCtx, time.Duration(r.options.RetryTimeout)*time.Second)
defer cancel()
}
r.processRetries(retryCtx, &rq, output)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Tie the retry context to enumeration cancellation.

retryCtx is created from context.Background() only after wg.Wait(), so queued 429s still run after Interrupt(), and the RetryTimeout window does not even start until the initial scan has fully drained. Please derive this context from the runner shutdown path instead of starting a fresh background context here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1571 - 1579, The retry context is created from
context.Background() so retries outlive runner shutdown; instead derive retryCtx
from the runner's shutdown/cancellation context (e.g., r.shutdownCtx or r.ctx
used by Interrupt()) and then apply the optional timeout via
context.WithTimeout; update the block that builds retryCtx (used by
r.processRetries) to use that base context so queued retries are cancelled when
Interrupt() is invoked and RetryTimeout starts relative to shutdown rather than
after wg.Wait().

Comment on lines +1650 to +1673
func (r *Runner) processRetries(ctx context.Context, rq *retryQueue, output chan Result) {
for round := 0; round < r.options.RetryRounds; round++ {
items := rq.drain()
if len(items) == 0 {
return
}
for _, item := range items {
timer := time.NewTimer(item.delay)
select {
case <-ctx.Done():
timer.Stop()
return
case <-timer.C:
}
result := r.analyze(item.hp, item.protocol, item.target, item.method, item.input, item.scanopts)
output <- result
if result.StatusCode == http.StatusTooManyRequests {
rq.push(retryItem{
hp: item.hp, protocol: item.protocol, target: item.target,
method: item.method, input: item.input, scanopts: item.scanopts,
delay: retryDelay(result, r.options.RetryDelay),
})
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Retried successes bypass TLS/CSP expansion.

processRetries stops after output <- result and optional requeue. Unlike the normal process flow in Line 1779-Line 1800 and Line 1840-Line 1849, it never feeds successful retry results back through the TLS/CSP discovery path, so SAN/CSP-derived targets are lost when the first response was a 429. Pull the post-analyze handling into a shared helper and call it from both paths.

Also applies to: 1779-1800, 1840-1849

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1650 - 1673, The retry path in processRetries
currently sends analyze() results directly to output and requeues 429s, which
skips the TLS/CSP expansion logic used in the normal process flow; extract the
post-analyze handling (the logic that inspects analyze() result, emits to
output, triggers SAN/CSP-derived target discovery and requeues on 429) into a
shared helper (e.g., handleAnalyzeResult(result Result, rq *retryQueue, output
chan Result, originalItem retryItem, opts options) or similar) and call that
helper from both process() and processRetries() instead of duplicating logic;
ensure the helper receives the original retryItem data so it can requeue with
retryDelay(result, r.options.RetryDelay) when result.StatusCode ==
http.StatusTooManyRequests and also performs the TLS/CSP expansion side-effects
before sending to output.

}
}

func retryDelay(res Result, fallbackMs int) time.Duration {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Unbounded Retry-After header enables sleep amplification DoS (CWE-400) — The retryDelay function parses the Retry-After header from server responses without validating upper bounds. An attacker-controlled server can send extremely large values (e.g., 'Retry-After: 999999999' or a date far in the future) causing httpx to create timers with multi-year durations.

Attack Example
Attacker's server returns:
HTTP/1.1 429 Too Many Requests
Retry-After: 999999999

The retryDelay function parses this as 999999999 seconds without bounds checking, creating a timer that would fire in ~31 years. While RetryTimeout caps actual sleep duration, this still enables maximum-duration sleeps on every retry.
Suggested Fix
Add maximum bounds check in retryDelay function: cap parsed Retry-After values to a reasonable maximum (e.g., 300 seconds). Change lines 1608-1609 to: 'if seconds, err := strconv.Atoi(ra[0]); err == nil && seconds > 0 && seconds <= 300 { return time.Duration(seconds) * time.Second }' and similarly cap time.Until(t) results at line 1612-1613.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@runner/runner.go` around lines 1605-1619, the retryDelay function parses
the Retry-After header without upper bounds validation; add a constant 'const
maxRetryAfterSeconds = 300' at the top of the file, then at line 1608 change the
condition to 'if seconds, err := strconv.Atoi(ra[0]); err == nil && seconds > 0
&& seconds <= maxRetryAfterSeconds' and at line 1612 add a bounds check: 'if d
:= time.Until(t); d > 0 && d <= maxRetryAfterSeconds*time.Second { return d }'
to prevent sleep amplification attacks from malicious servers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

round robin system for 429 status code

2 participants