Skip to content

Make retry behavior configurable #50

@blindzero

Description

@blindzero

Problem Statement

IdLE already has centralized retry logic (Invoke-IdleWithRetry) with transient-error handling. However, the retry parameters are currently effectively fixed to engine defaults during execution. This is insufficient for real-world environments because different target systems (e.g., Entra ID / Graph, Exchange Online, Active Directory, on-prem APIs) have different throttling, latency, and transient failure patterns.

We need a host-owned way to configure retry behavior per step, without introducing provider-specific logic into steps and without allowing unsafe/agent-generated logic (e.g., ScriptBlocks).

Proposed Solution

Summary

Introduce RetryProfiles as part of -ExecutionOptions (host-owned). Steps may optionally reference a profile via a string key RetryProfile. The engine resolves the effective policy at execution time:

  • If Step.RetryProfile is set → look up the profile in ExecutionOptions.RetryProfiles[<key>]
  • If missing → use ExecutionOptions.DefaultRetryProfile
  • If ExecutionOptions is not provided → keep current engine defaults (no behavior change)

Public API surface (host-owned)

Add -ExecutionOptions to the public invocation surface (e.g., Invoke-IdlePlan), passed through to the core execution entrypoint.

Shape (PowerShell hashtable / IDictionary only):

-ExecutionOptions @{
  RetryProfiles = @{
    Default        = @{ MaxAttempts = 3; InitialDelayMilliseconds = 200; BackoffFactor = 2.0; MaxDelayMilliseconds = 5000;  JitterRatio = 0.2 }
    ExchangeOnline = @{ MaxAttempts = 6; InitialDelayMilliseconds = 500; BackoffFactor = 2.0; MaxDelayMilliseconds = 30000; JitterRatio = 0.3 }
    ActiveDirectory= @{ MaxAttempts = 2; InitialDelayMilliseconds = 200; BackoffFactor = 2.0; MaxDelayMilliseconds = 2000;  JitterRatio = 0.1 }
  }
  DefaultRetryProfile = 'Default'
}

Step schema extension (declarative only)

Add optional property on steps:

  • RetryProfile (string; optional)

Rules:

  • Pattern: ^[A-Za-z0-9_.-]{1,64}$
  • No inline retry policy blocks in step definitions for v1.0.0

Example (schematic JSON/YAML):

{
  "Name": "Ensure mailbox",
  "Type": "EnsureEntitlement",
  "RetryProfile": "ExchangeOnline"
}

Validation (agent-safe)

  • Reject ScriptBlocks anywhere in ExecutionOptions (reuse existing Assert-IdleNoScriptBlock patterns).
  • Only accept hashtable / IDictionary for ExecutionOptions and inner objects.
  • Enforce hard limits to avoid misconfig and retry storms:
Field Type Limits
MaxAttempts int 0..10 (0 = no retry)
InitialDelayMilliseconds int 0..60000
BackoffFactor double >= 1.0
MaxDelayMilliseconds int 0..300000 and >= InitialDelayMilliseconds
JitterRatio double 0.0..1.0

Resolution behavior:

  • Unknown RetryProfile key referenced by a step → fail-fast with a clear error pointing at the step and key.
  • Missing RetryProfiles / DefaultRetryProfile → fall back to current defaults.

Execution scope

  • Apply to all executed steps, including OnFailureSteps.
  • Each step resolves its own RetryProfile independently (OnFailure steps can use different profiles if specified).

Alternatives Considered

  1. Inline RetryPolicy per step
    • Rejected for v1.0.0 due to duplication/drift, larger review surface, and higher misconfiguration risk.
  2. Global single retry policy per run
    • Rejected because different target systems within one plan require different retry behavior.
  3. Infer policy from provider/system automatically
    • Rejected (for now) because it becomes implicit/magical and often ambiguous; increases debugging complexity.

Impact

  • Backward compatibility: Existing workflows remain valid. If RetryProfile is not used and ExecutionOptions is not provided, behavior is unchanged.
  • Workflow schema: Adds an optional, non-breaking property RetryProfile on steps.
  • Security posture: Maintains the “no ScriptBlocks in config” boundary; centralized validation prevents excessive retries/delays.

Additional Context

  • Existing retry implementation: IdLE.Core/Private/Invoke-IdleWithRetry.ps1 already supports transient marker-based decisions.
  • This issue extends the execution surface to allow host configuration and adds a minimal declarative selector (RetryProfile) to steps.

Metadata

Metadata

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions