Add OpenTelemetry observability to custom background tasks by 2chanhaeng · Pull Request #812 · fedify-dev/fedify

2chanhaeng · 2026-06-20T17:12:13Z

Resolves #799, the third and final sub-issue of #206 (custom background tasks). Once this lands, #206 is fully resolved.

Background

The core task API (#797/#803) shipped task dispatch behavior and structured logging, but the task worker carries no span and no metrics: of the message variants handled in processQueuedTask, every other branch (fanout/outbox/inbox) is dispatched with instrumentation, but task.

This PR closes that gap by layering task-specific telemetry onto the decision points the core already established. It reuses the queue-task metric pattern introduced in #759 and mirrors the existing http_signatures.failure_reason enum in metrics.ts. It changes no drop/retry behavior: telemetry is observed, never enforced.

What changes

Span

Each dequeued task now runs inside a fedify.task consumer span. The name is namespaced under fedify. rather than activitypub. because tasks are not part of ActivityPub, paralleling the existing activitypub.inbox/outbox/fanout spans. The span:

Inherits the enqueue site's trace context, so a task is a child of whatever requested it.
Carries fedify.task.name and fedify.task.attempt (the zero-based attempt number).
Carries fedify.task.failure_reason and sets its status to ERROR on a terminal failure, so trace backends surface failed tasks without re-deriving the reason from logs.

Failure attribution

#listenTaskMessage now returns the failure reason (or undefined on success) so the span/metric wrapper can attribute it. To distinguish a deserialization failure from a validation failure, the former combined codec.decode(...) call is split into its existing deserialize then validate phases. This is behavior-preserving—decode is literally validate(schema, await deserialize(raw))—and TaskCodec gains a thin instance validate() wrapper so the dispatch site can split the two phases without importing the class.

The four bounded fedify.task.failure_reason values map one-to-one to the worker's dispatch decision points:

deserialization — the wire payload could not be deserialized.
validation — the deserialized payload failed schema validation.
unknown_task — the task name has no registered handler.
handler — the registered handler threw.

A worker shutdown is the one exception: an interrupted attempt is reported as an aborted outcome with no fedify.task.failure_reason, never as a handler failure.

Metrics surface

Tasks reuse the fedify.queue.task.* metric family under a new task role:

QueueTaskRole gains "task".
QueueTaskCommonAttributes gains taskName, emitted as fedify.task.name.
New bounded QueueTaskFailureReason type, mirroring HttpSignatureMetricFailureReason.
recordQueueTaskOutcome() gains an optional trailing failureReason parameter (non-breaking); it is emitted as fedify.task.failure_reason only on a failed result.
recordQueueTaskEnqueued records role: "task" at both the enqueue site (after a genuine dispatch, never on a dedup skip or a failed enqueue) and the retry re-enqueue site.

fedify.queue.backend reports the resolved queue—the one actually used after routing, which may be the outbox queue under the fallback mode—so the metric stays accurate regardless of routing.

Cardinality

Bounded by construction: task names are a registered, known-at-startup set (never derived from message content), and failure_reason is a four-value bounded enum. Combined cardinality is taskName × |failure_reason| × queue.backend, within OTel attribute safety. The process-local in_flight UpDownCounter omits fedify.task.name so its series stays drained.

Out of scope

A management UI / inspection RPC.
Per-task custom metric attributes beyond taskName (would risk unbounded cardinality).
Refining the four-value QueueTaskFailureReason set—explicitly open to later refinement as long as it stays a small bounded set.
Any change to drop/retry semantics.

Tests

packages/fedify/src/federation/tasks/tasks.test.ts gains a telemetry block with one assertion per acceptance criterion, using TestSpanExporter / createTestTracerProvider / createTestMeterProvider from @fedify/fixture. Coverage:

A fedify.task span exists with fedify.task.name and fedify.task.attempt.
Parent context is inherited from the enqueue site.
Each failure path records the correct fedify.task.failure_reason.
fedify.queue.backend reflects the resolved queue, including the outbox fallback.
recordQueueTaskEnqueued / recordQueueTaskOutcome carry role: "task".

Verified across Deno, Node.js, and Bun.

Documentation

docs/manual/tasks.md: a new "Observability" section covering the span, its attributes, the metric family, and the bounded failure-reason set; the stale "ships without OpenTelemetry spans and metrics" note removed from "Limitations".
docs/manual/opentelemetry.md: the fedify.task span row, the task value added to the fedify.queue.role enumeration, a widened failed-result definition covering acked task drops, and the fedify.task.name / fedify.task.attempt / fedify.task.failure_reason attribute rows.
CHANGES.md: the existing task-feature entry extended with the observability additions and the Custom background tasks: observability #799 reference link.

AI disclosure

Assisted-by: Claude Code:claude-opus-4-8

coderabbitai · 2026-06-20T17:12:23Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 327a66fc-2370-44b4-8154-f7bead8953fb

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces a custom background task API to Fedify, allowing developers to define, enqueue, and process arbitrary background jobs with type-safe payload validation via Standard Schema. The implementation supports robust serialization of complex types and Activity Vocabulary objects using devalue, customizable retry policies, queue routing, best-effort or native deduplication, and OpenTelemetry instrumentation. Feedback on the changes highlights a compatibility issue with Node.js 20 due to the use of Array.fromAsync in codec.ts, suggesting standard for...of loops instead, and recommends implementing a recursion depth limit during deserialization to prevent potential Denial of Service (DoS) attacks from deeply nested payloads.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-20T17:14:09Z

+    classReviver(
+      isInstanceOf(Array),
+      (): unknown[] => [],
+      async (revive, node, arr) => {
+        for (const item of await Array.fromAsync(node, revive)) arr.push(item);
+      },
+    ),


Using Array.fromAsync introduces compatibility issues with Node.js 20 (which is an active LTS version of Node.js) as it was introduced in ES2024 and is only natively supported in Node.js 22+. To maintain compatibility across all target environments (including Node.js 20), replace it with a standard for...of loop with await.

classReviver( isInstanceOf(Array), (): unknown[] => [], async (revive, node, arr) => { for (const item of node) { arr.push(await revive(item)); } }, ),

gemini-code-assist · 2026-06-20T17:14:09Z

+      isInstanceOf(Set),
+      () => new Set<unknown>(),
+      async (revive, node, set) => {
+        for (const v of await Array.fromAsync(node, revive)) set.add(v);
+      },
+    ),


Using Array.fromAsync introduces compatibility issues with Node.js 20 (which is an active LTS version of Node.js) as it was introduced in ES2024 and is only natively supported in Node.js 22+. To maintain compatibility across all target environments (including Node.js 20), replace it with a standard for...of loop with await.

Suggested change

isInstanceOf(Set),

() => new Set<unknown>(),

async (revive, node, set) => {

for (const v of await Array.fromAsync(node, revive)) set.add(v);

},

),

classReviver(

isInstanceOf(Set),

() => new Set<unknown>(),

async (revive, node, set) => {

for (const v of node) {

set.add(await revive(v));

}

},

),

gemini-code-assist · 2026-06-20T17:14:09Z

+  #revive = (seen: Seen): Revive => {
+    const inner: Revive = async (node) => {
+      if (node === null || typeof node !== "object") return node;
+      if (seen.has(node)) return seen.get(node);
+      for (const reviver of this.#classRevivers) {
+        const out = reviver(seen, inner, node);
+        if (out !== undefined) return await out;
+      }
+      // devalue can handle non-container objects.
+      return node;
+    };
+    return inner;
+  };


To prevent stack overflow or resource exhaustion from maliciously crafted deep JSON payloads, implement a depth limit in recursive traversal functions. The limit should be high enough for legitimate data (e.g., 1000) but low enough to prevent DoS attacks.

#revive = (seen: Seen, maxDepth = 1000): Revive => { const inner = async (node: unknown, depth = 0): Promise<unknown> => { if (node === null || typeof node !== "object") return node; if (seen.has(node)) return seen.get(node); if (depth >= maxDepth) { throw new TypeError("Maximum depth limit exceeded during deserialization."); } const nextRevive = (n: unknown) => inner(n, depth + 1); for (const reviver of this.#classRevivers) { const out = reviver(seen, nextRevive, node); if (out !== undefined) return await out; } // devalue can handle non-container objects. return node; }; return inner; };

References

To prevent stack overflow from maliciously crafted deep JSON, implement a depth limit in recursive traversal functions. The limit should be high enough for legitimate data but low enough to prevent DoS attacks.

codecov · 2026-06-20T17:21:47Z

Codecov Report

❌ Patch coverage is 98.40764% with 5 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
packages/fedify/src/federation/tasks/enqueue.ts	97.17%	4 Missing and 1 partial ⚠️

Files with missing lines	Coverage Δ
packages/fedify/src/federation/metrics.ts	`99.35% <100.00%> (+<0.01%)`	⬆️
packages/fedify/src/federation/middleware.ts	`90.99% <100.00%> (+0.06%)`	⬆️
packages/fedify/src/federation/mq.ts	`86.55% <100.00%> (+0.65%)`	⬆️
packages/fedify/src/federation/tasks/codec.ts	`99.13% <100.00%> (+0.03%)`	⬆️
packages/fedify/src/federation/tasks/mod.ts	`100.00% <100.00%> (ø)`
packages/fedify/src/testing/mod.ts	`100.00% <100.00%> (ø)`
packages/fedify/src/testing/tasks.ts	`100.00% <100.00%> (ø)`
packages/fedify/src/federation/tasks/enqueue.ts	`97.17% <97.17%> (ø)`

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:

📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Context.enqueueTask() and enqueueTaskMany() now accept a deduplicationKey requesting at-most-once enqueue for tasks that share it (new TaskEnqueueOptions.deduplicationKey). Resolution follows the queue and key-value store capabilities: - A queue declaring the new MessageQueue.nativeDeduplication owns the check; the key is forwarded through the new MessageQueueEnqueueOptions.deduplicationKey. - Otherwise Fedify applies a best-effort guard through the optional KvStore.cas primitive under a new taskDeduplication key prefix, tunable with the new FederationOptions.taskDeduplicationTtl and taskDeduplicationFallback options. For enqueueTaskMany(), a single key governs the whole batch. A native queue that does not implement enqueueMany() cannot express batch-level at-most-once with a per-message key, so such a multi-item enqueue is rejected with a TypeError instead of silently leaking duplicates. Configuration errors that are decidable without a payload (a native queue lacking enqueueMany, or a closed fallback without cas) are checked before payloads are validated and encoded, so they reject before any user schema runs or any key is reserved. fedify-dev#798 Assisted-by: Claude Code:claude-opus-4-8

The #enqueueTasks and #encodeTaskMessage methods made ContextImpl oversized, so move the handle validation, deduplication planning, payload encoding, and queue dispatch into a new tasks/enqueue.ts module. ContextImpl now delegates to enqueueTasks(), passing only the small slice of itself (federation, codec, origin, data) the pipeline needs. Pull the shared task-test helpers (the schema factory, stock schemas, base federation options, and the recording MockQueue) into a new testing/mq-tasks.ts module, and split the enqueue-specific cases out of tasks.test.ts into enqueue.test.ts. Teach the fixture-usage check to expand glob patterns in its allowlist so the whole testing/ directory is covered by a single entry instead of one path per file. Assisted-by: Claude Code:claude-opus-4-8

Two branches both touched the task testing utilities and diverged: one split MockQueue and the shared schemas/options out into mq-tasks.ts, while the other kept evolving them in tasks.ts. After rebasing the common edits, consolidate everything back into a single tasks.ts and drop the now-redundant mq-tasks.ts. Assisted-by: Claude Code:claude-opus-4-8

The key-value deduplication path reserved a marker before dispatching to the queue but never undid it when the dispatch failed. A transient backend failure therefore left the marker behind, so the retry was silently deduplicated against a task that had never reached the queue. The cas claim now stores a unique token instead of a bare `true`, and a failed dispatch conditionally clears it (cas succeeds only while the stored value is still our token). The conditional clear keeps a stale rollback from deleting a marker that another concurrent enqueue has already re-claimed. A rollback that itself fails is logged and swallowed so the original enqueue error still reaches the caller. The enqueueMany requirement for deduplicated multi-item batches now keys on whether deduplication is actually applied—a native queue or the cas fallback—rather than on nativeDeduplication alone. Under the "open" fallback (no native dedup, no cas) no marker is taken, so the batch fans out without deduplication instead of throwing. ParallelMessageQueue likewise rejects a deduplicated batch when the wrapped queue lacks enqueueMany, since fanning out cannot carry one key atomically. fedify-dev#798 Assisted-by: Claude Code:claude-opus-4-8

Layer task-specific telemetry onto the custom background task dispatch path, reusing the queue-task metric pattern and mirroring the existing `http_signatures.failure_reason` enum in metrics.ts. Each dequeued task now runs in a `fedify.task` span that inherits the enqueue site's trace context and carries `fedify.task.name`, `fedify.task.attempt`, and, on a terminal failure, `fedify.task.failure_reason`. The `fedify.queue.task.*` metrics report task runs under the new `"task"` role with the task name and, on failure, a bounded `fedify.task.failure_reason`. To tell the failure reasons apart, `#listenTaskMessage` splits the former `decode()` call into its deserialize and validate phases and returns the decision point that failed: `deserialization`, `validation`, `unknown_task`, or `handler`. A swallowed abort is reported as a graceful interruption, not a failure. The reported `fedify.queue.backend` reflects the resolved queue so it stays accurate under the outbox fallback. Public surface: `QueueTaskRole` gains `"task"`, `QueueTaskCommonAttributes` gains `taskName`, and a new `QueueTaskFailureReason` type plus an optional trailing `failureReason` parameter on `recordQueueTaskOutcome()` carry the reason. `TaskCodec` exposes an instance `validate()` wrapper so the dispatch site can split decoding without importing the class. fedify-dev#799 Assisted-by: Claude Code:claude-opus-4-8

gemini-code-assist Bot reviewed Jun 20, 2026

View reviewed changes

2chanhaeng added 5 commits June 22, 2026 00:23

2chanhaeng force-pushed the issue/799 branch from 624d8cb to 049c8c8 Compare June 22, 2026 00:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add OpenTelemetry observability to custom background tasks#812

Add OpenTelemetry observability to custom background tasks#812
2chanhaeng wants to merge 5 commits into
fedify-dev:feat/custom-workerfrom
2chanhaeng:issue/799

2chanhaeng commented Jun 20, 2026

Uh oh!

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Uh oh!

codecov Bot commented Jun 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

2chanhaeng commented Jun 20, 2026

Background

What changes

Span

Failure attribution

Metrics surface

Cardinality

Out of scope

Tests

Documentation

AI disclosure

Uh oh!

coderabbitai Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 20, 2026

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented Jun 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 20, 2026 •

edited

Loading

codecov Bot commented Jun 20, 2026 •

edited

Loading