go-again provides:
again: a thread-safe retry helper with exponential backoff, jitter, timeout, cancellation, hooks, and a temporary-error registry.pkg/scheduler: a lightweight HTTP scheduler with pluggable state storage (in-memory by default, SQLite built-in) that reuses the retrier for retryable requests and optional callbacks.
As of February 27, 2026, the core retrier hardening work and the scheduler extension described in PRD.md are implemented and covered by tests, including race checks.
- Configurable
MaxRetries,Interval,Jitter,BackoffFactor, andTimeout DoandDoWithContextretry APIs- Temporary error filtering via explicit error list and/or
Registry - Retry-all behavior when no temporary errors are supplied and the registry is empty
- Cancellation via caller context and
Retrier.Cancel()/Retrier.Stop() Errorstrace (Attempts,Last) plusErrors.Join()DoWithResult[T]helper- Optional
sloglogger and retry hooks
- Interval scheduling with
StartAt,EndAt, andMaxRuns - HTTP request execution (
GET,POST,PUT) - Retry integration via
RetryPolicy - Optional callback with bounded response-body capture
- URL validation by default (via
sectools) with override/disable support - Custom HTTP client, logger, concurrency limit, and scheduler-state storage backend
go get github.com/hyp3rd/go-againRequires Go 1.26+ (see go.mod).
package main
import (
"context"
"errors"
"fmt"
"net/http"
"time"
again "github.com/hyp3rd/go-again"
)
func main() {
retrier, err := again.NewRetrier(
context.Background(),
again.WithMaxRetries(3), // retries after the first attempt
again.WithInterval(100*time.Millisecond),
again.WithJitter(50*time.Millisecond),
again.WithTimeout(2*time.Second),
)
if err != nil {
panic(err)
}
retrier.Registry.LoadDefaults()
retrier.Registry.RegisterTemporaryError(http.ErrAbortHandler)
var attempts int
errs := retrier.Do(context.Background(), func() error {
attempts++
if attempts < 3 {
return http.ErrAbortHandler
}
return nil
})
defer retrier.PutErrors(errs)
if errs.Last != nil {
fmt.Println("failed:", errs.Last)
return
}
fmt.Println("success after attempts:", attempts)
_ = errors.Join(errs.Attempts...) // equivalent to errs.Join()
}MaxRetriescounts retries after the first attempt (total attempts = MaxRetries + 1).- If
temporaryErrorsis omitted andRegistryhas entries, the registry is used as the retry filter. - If
temporaryErrorsis omitted and the registry is empty, all errors are retried until success/timeout/cancel/max-retries. Dochecks cancellation between attempts. For long-running work, useDoWithContext.Cancel()andStop()cancel the retrier's internal lifecycle context; they are terminal for that retrier instance.
Use DoWithContext when the operation itself accepts a context and should stop promptly on cancellation:
// assuming `retrier` was created as in the previous example
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
errs := retrier.DoWithContext(ctx, func(ctx context.Context) error {
select {
case <-time.After(250 * time.Millisecond):
return nil
case <-ctx.Done():
return ctx.Err()
}
})
defer retrier.PutErrors(errs)The retryable function should observe ctx.Done(); if it ignores context cancellation, the work may continue running after the retrier returns.
The scheduler runs jobs immediately when scheduled (or at StartAt if set), then continues every Schedule.Every until MaxRuns, EndAt, removal, or Stop().
Request and callback URLs are validated by default using sectools (HTTPS only, no userinfo, and no private/localhost hosts unless configured otherwise).
package main
import (
"context"
"net/http"
"time"
again "github.com/hyp3rd/go-again"
"github.com/hyp3rd/go-again/pkg/scheduler"
)
func main() {
retrier, _ := again.NewRetrier(
context.Background(),
again.WithMaxRetries(5),
again.WithInterval(10*time.Millisecond),
again.WithJitter(10*time.Millisecond),
again.WithTimeout(5*time.Second),
)
s := scheduler.NewScheduler(
scheduler.WithConcurrency(8),
)
defer s.Stop()
_, _ = s.Schedule(scheduler.Job{
Schedule: scheduler.Schedule{
Every: 1 * time.Minute,
MaxRuns: 1,
},
Request: scheduler.Request{
Method: http.MethodPost,
URL: "https://example.com/endpoint",
Body: []byte(`{"ping":"pong"}`),
},
Callback: scheduler.Callback{
URL: "https://example.com/callback",
},
RetryPolicy: scheduler.RetryPolicy{
Retrier: retrier,
RetryStatusCodes: []int{http.StatusTooManyRequests, http.StatusInternalServerError},
},
})
}Runnable version:
go run ./__examples/schedulerSource:
__examples/scheduler/scheduler.go
s := scheduler.NewScheduler(
scheduler.WithHTTPClient(server.Client()),
scheduler.WithURLValidator(nil), // allow local endpoints for example usage
)
defer s.Stop()
jobID, err := s.Schedule(scheduler.Job{
Schedule: scheduler.Schedule{Every: 10 * time.Millisecond, MaxRuns: 1},
Request: scheduler.Request{Method: http.MethodGet, URL: server.URL + "/target"},
Callback: scheduler.Callback{URL: server.URL + "/callback"},
})
if err != nil {
panic(err)
}
payload := <-callbackCh
fmt.Println("job:", jobID, "success:", payload.Success, "status:", payload.StatusCode)// after Schedule(...)
status, ok := s.JobStatus(jobID)
if ok {
fmt.Println("state:", status.State, "runs:", status.Runs, "active:", status.ActiveRuns)
}
history, ok := s.JobHistory(jobID)
if ok {
for _, run := range history {
fmt.Println("run#", run.Sequence, "status:", run.Payload.StatusCode, "success:", run.Payload.Success)
}
}
filtered := s.QueryJobStatuses(scheduler.JobStatusQuery{
States: []scheduler.JobState{scheduler.JobStateRunning, scheduler.JobStateScheduled},
Offset: 0,
Limit: 50,
})
fmt.Println("filtered statuses:", len(filtered))
recentRuns, ok := s.QueryJobHistory(jobID, scheduler.JobHistoryQuery{
FromSequence: 10,
Limit: 5,
})
if ok {
fmt.Println("recent retained runs:", len(recentRuns))
}Runnable version:
go run ./__examples/scheduler_sqliteSource:
__examples/scheduler_sqlite/scheduler_sqlite.go
dbPath := filepath.Join(os.TempDir(), "go-again-scheduler-example.db")
storage, err := scheduler.NewSQLiteJobsStorageWithOptions(
dbPath,
scheduler.WithSQLiteHistoryMaxAge(24*time.Hour),
scheduler.WithSQLiteHistoryMaxRowsPerJob(100),
)
if err != nil {
panic(err)
}
defer storage.Close()
s := scheduler.NewScheduler(
scheduler.WithJobsStorage(storage),
scheduler.WithURLValidator(nil),
)
defer s.Stop()
jobID, err := s.Schedule(scheduler.Job{
Schedule: scheduler.Schedule{Every: 20 * time.Millisecond, MaxRuns: 1},
Request: scheduler.Request{Method: http.MethodGet, URL: target.URL},
})
if err != nil {
panic(err)
}
fmt.Println("scheduled job:", jobID)
pruned, err := storage.PruneHistory()
if err != nil {
panic(err)
}
fmt.Println("pruned rows:", pruned)Use NewSchedulerWithError(...) when constructor-time URL validator initialization errors must fail startup.
s, err := scheduler.NewSchedulerWithError(
scheduler.WithConcurrency(8),
)
if err != nil {
// fail startup instead of warning + degraded mode
return err
}
defer s.Stop()WithHTTPClient(client)sets the HTTP client used for requests and callbacks.WithLogger(logger)sets the scheduler logger.WithConcurrency(n)limits concurrent executions whenn > 0.WithJobsStorage(storage)sets pluggable scheduler state storage (active jobs plus status/history; default: in-memory).WithHistoryLimit(limit)sets retained per-job history length (default20).WithURLValidator(validator)overrides URL validation. Passnilto disable validation.NewSchedulerWithError(...)returns constructor errors (including startup state reconciliation failures and default URL validator initialization failure).
- Supported methods for requests and callbacks:
GET,POST,PUT. - Callbacks are skipped when
Callback.URLis empty. - Callback method defaults to
POST. Callback.MaxBodyBytesdefaults to4096.Request.TimeoutandCallback.Timeoutapply per HTTP request/callback (not the schedule lifetime).- If
RetryPolicy.Retrieris nil, the scheduler creates a default retrier and loads registry defaults. - Calling
ScheduleafterStop()returnsscheduler.ErrSchedulerStopped. Schedulereturnsscheduler.ErrStorageOperationwhen required scheduler-state persistence fails.NewSchedulerWithError(...)should be preferred for fail-closed startup behavior in security-sensitive paths.JobCount()andJobIDs()provide lightweight read-only scheduler introspection.JobStatus(id),JobStatuses(), andJobHistory(id)provide status and retained run history snapshots.QueryJobStatuses(JobStatusQuery)adds ID/state filters with pagination (Offset,Limit) over status snapshots.QueryJobHistory(id, JobHistoryQuery)adds history filtering (FromSequence) and tail limiting (Limit) while preserving ascending sequence order.- Default
InMemoryJobsStorageis process-local; useWithJobsStorage(...)for custom durable/backed storage. NewSQLiteJobsStorage(path)provides a built-in durable storage implementation forWithJobsStorage(...); callClose()when finished.NewSQLiteJobsStorageWithOptions(path, ...)configures SQLite retention controls:WithSQLiteHistoryMaxAge(duration),WithSQLiteHistoryMaxRowsPerJob(n), andWithSQLiteHistoryRetention(...).SQLiteJobsStorage.PruneHistory()andPruneHistoryWithRetention(...)provide manual pruning for periodic cleanup jobs.- SQLite retention is also applied on write for new history records (age-based and max-rows-per-job), in addition to scheduler
WithHistoryLimit. - On scheduler startup, recovered active-job registrations from storage are reconciled:
scheduled/runningstates are markedcanceled, then active-job IDs are cleared. Jobs are not auto-resumed. - Non-fatal storage write failures during runtime transitions are logged (warn) and execution continues.
- Non-fatal request/callback response body read/close failures are logged (warn) and execution continues.
NewSchedulerWithError(...)fails constructor-time reconciliation errors;NewScheduler()logs a warning and continues.NewScheduler()logs a warning and continues if default URL validator initialization fails; useNewSchedulerWithError()to fail closed.
validator, _ := validate.NewURLValidator(
validate.WithURLAllowPrivateIP(true),
validate.WithURLAllowLocalhost(true),
validate.WithURLAllowIPLiteral(true),
)
s := scheduler.NewScheduler(
scheduler.WithURLValidator(validator),
)s := scheduler.NewScheduler(
scheduler.WithURLValidator(nil),
)Run the example programs directly:
go run ./__examples/chan
go run ./__examples/context
go run ./__examples/scheduler
go run ./__examples/scheduler_sqlite
go run ./__examples/timeout
go run ./__examples/validatemake test
make test-race
make lint
make secBenchmark (direct Go command):
go test -bench=. -benchtime=3s -benchmem -run=^$ -memprofile=mem.out ./...Scheduler.Stop()cancels the scheduler lifecycle; the same instance is not intended to be reused afterward.Retrier.Cancel()/Retrier.Stop()are terminal for the retrier instance.DoWithContextcan only stop work promptly if the retryable function respects the provided context.NewScheduler()(non-error constructor) intentionally degrades to warning-only behavior if default URL validator initialization fails; useNewSchedulerWithError()when you need constructor-time failure.
go-again adds retry orchestration overhead but is designed to keep allocations low. See the benchmark in tests/retrier_test.go and run the benchmark command above in your environment for current numbers.
- API docs: https://pkg.go.dev/github.com/hyp3rd/go-again
- Product/status notes:
PRD.md
The code and documentation in this project are released under Mozilla Public License 2.0.
I'm a surfer, a crypto trader, and a software architect with 15 years of experience designing highly available distributed production environments and developing cloud-native apps in public and private clouds. Feel free to hook me up on LinkedIn.