Skip to content

fix: set APScheduler misfire_grace_time=None for reliable job execution#178

Open
sakhnenkoff wants to merge 1 commit intoRichardAtCT:mainfrom
sakhnenkoff:patch/scheduler-resilience
Open

fix: set APScheduler misfire_grace_time=None for reliable job execution#178
sakhnenkoff wants to merge 1 commit intoRichardAtCT:mainfrom
sakhnenkoff:patch/scheduler-resilience

Conversation

@sakhnenkoff
Copy link
Copy Markdown

Summary

Fixes #175.

The AsyncIOScheduler in JobScheduler.__init__ is created without explicit job_defaults, meaning APScheduler uses its default misfire_grace_time=1 second. This is too strict for a bot where Claude command execution routinely takes minutes — any job that can't start within 1 second of its scheduled time is silently skipped.

Changes:

  • Set misfire_grace_time=None — guarantees every job fires exactly once, regardless of how late
  • Set coalesce=True — merges multiple missed runs into a single execution, preventing duplicate firing

The worst case with this configuration is one late run, never a skipped one.

Tests

2 new tests in tests/unit/test_scheduler/test_misfire_config.py:

  • test_misfire_grace_time_is_none — verifies the config value
  • test_coalesce_is_enabled — verifies coalesce is on

All existing tests continue to pass.

With misfire_grace_time=300, heartbeats were still missed when the bot
was busy for >5 minutes (observed: 8m54s miss). Setting to None
guarantees every job fires exactly once, combined with coalesce=True
to prevent duplicate execution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

APScheduler default misfire_grace_time is too strict for long-running Claude commands

1 participant