Problem
The shape of the API conflates two unrelated concepts:
-
LockTimeout is a wait duration — exactly what context.Context
exists for. Callers already pass a ctx to Save, Get, etc. Adding
WithLockTimeout on top means callers now have two ways to express
"how long am I willing to wait" — one via the ctx they pass per-call,
and one via a store-level option configured at construction. This
becomes ambiguous when both are set (the implementation picks
whichever fires first), and it forces callers to think about lock
waits separately from operation deadlines.
-
StaleLockTimeout is a file-age threshold, not a wait duration —
it controls when a .posixage.lock file is considered abandoned and
may be removed. It's a semantically distinct concept that genuinely
belongs as a store-level option.
Bundling both into a flock.Config struct passed to every TryLock call
obscures this distinction and adds API surface (a struct, two options,
two new validation paths) for capability that is already expressible via
context.Context.
Proposal
Drop WithLockTimeout and the flock.Config struct entirely. Lock
acquisition retries until the caller context is canceled — callers that
want a bounded wait pass context.WithTimeout(ctx, …); callers that want
no timeout (the disable-timeout use case the PR was motivated by) pass a
context without a deadline.
Keep WithStaleLockTimeout as the only locking knob, since it controls
a file-age threshold rather than a wait duration. The single-attempt
stale-recovery trigger moves from "after LockTimeout expires" to "after
the first failed acquisition attempt" — semantically the same, but no
longer dependent on a separately-configured wait duration.
Resulting API surface
// flock package
func TryLock(ctx context.Context, root *os.Root, staleAfter time.Duration) (UnlockFunc, error)
func TryRLock(ctx context.Context, root *os.Root, staleAfter time.Duration) (UnlockFunc, error)
// posixage store
posixage.WithStaleLockTimeout(d time.Duration) Options // unchanged in spirit
// posixage.WithLockTimeout — removed
Trade-off
Callers lose the ability to set a short lock-retry while keeping a long
overall operation deadline (e.g. "this Save can run for 30s but fail
the lock acquisition after 100ms"). This isn't a real use case in
practice: if the lock is held by a live process for longer than 100ms,
stale recovery won't help (the file isn't actually stale), so the caller
would just fail. The previous 100ms default was effectively a fail-fast
heuristic, which the new first-attempt-then-recover flow expresses more
directly.
Why this matters now
Doing this refactor before #521 merges (or as a follow-up before it
ships externally) avoids landing an API we already know we want to
remove. WithLockTimeout would become a deprecation-and-removal cycle
otherwise.
Problem
The shape of the API conflates two unrelated concepts:
LockTimeoutis a wait duration — exactly whatcontext.Contextexists for. Callers already pass a
ctxtoSave,Get, etc. AddingWithLockTimeouton top means callers now have two ways to express"how long am I willing to wait" — one via the ctx they pass per-call,
and one via a store-level option configured at construction. This
becomes ambiguous when both are set (the implementation picks
whichever fires first), and it forces callers to think about lock
waits separately from operation deadlines.
StaleLockTimeoutis a file-age threshold, not a wait duration —it controls when a
.posixage.lockfile is considered abandoned andmay be removed. It's a semantically distinct concept that genuinely
belongs as a store-level option.
Bundling both into a
flock.Configstruct passed to everyTryLockcallobscures this distinction and adds API surface (a struct, two options,
two new validation paths) for capability that is already expressible via
context.Context.Proposal
Drop
WithLockTimeoutand theflock.Configstruct entirely. Lockacquisition retries until the caller context is canceled — callers that
want a bounded wait pass
context.WithTimeout(ctx, …); callers that wantno timeout (the disable-timeout use case the PR was motivated by) pass a
context without a deadline.
Keep
WithStaleLockTimeoutas the only locking knob, since it controlsa file-age threshold rather than a wait duration. The single-attempt
stale-recovery trigger moves from "after
LockTimeoutexpires" to "afterthe first failed acquisition attempt" — semantically the same, but no
longer dependent on a separately-configured wait duration.
Resulting API surface