Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,19 @@ To be released.
[#771]: https://github.com/fedify-dev/fedify/pull/771
[#772]: https://github.com/fedify-dev/fedify/pull/772

### @fedify/backfill

- Added *@fedify/backfill* for reconstructing ActivityPub conversations.
It supports FEP-f228 context collections containing post-like objects or
`Create` activities, optional reply-tree traversal, ordered hybrid
strategies, shared safety budgets, deduplication, and traversal-local
document caching. [[#275], [#779], [#801], [#807] by Jiwon Kwon]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to this pull request as well!


[#275]: https://github.com/fedify-dev/fedify/issues/275
[#779]: https://github.com/fedify-dev/fedify/pull/779
[#801]: https://github.com/fedify-dev/fedify/pull/801
[#807]: https://github.com/fedify-dev/fedify/pull/807

### @fedify/fixture

- Added `createTestMeterProvider()` and `TestMetricRecorder` helpers for
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,8 @@ The repository is organized as a monorepo with the following packages:
creating new Fedify projects. Wraps @fedify/init.
- *packages/amqp/*: AMQP/RabbitMQ driver (@fedify/amqp) for Fedify.
- *packages/astro/*: Astro integration (@fedify/astro) for Fedify.
- *packages/backfill/*: ActivityPub conversation backfill support
(@fedify/backfill) for Fedify.
- *packages/cfworkers/*: Cloudflare Workers integration (@fedify/cfworkers)
for Fedify.
- *packages/debugger/*: Embedded ActivityPub debug dashboard
Expand Down
1 change: 1 addition & 0 deletions docs/.vitepress/config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,7 @@ const MANUAL = {
{ text: "Outbox listeners", link: "/manual/outbox.md" },
{ text: "Sending activities", link: "/manual/send.md" },
{ text: "Collections", link: "/manual/collections.md" },
{ text: "Conversation backfill", link: "/manual/backfill.md" },
{ text: "Object dispatcher", link: "/manual/object.md" },
{ text: "Access control", link: "/manual/access-control.md" },
{ text: "WebFinger", link: "/manual/webfinger.md" },
Expand Down
203 changes: 203 additions & 0 deletions docs/manual/backfill.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,203 @@
---
description: >-
Reconstruct ActivityPub conversations from FEP-f228 context collections or
reply relationships using the @fedify/backfill package.
---

Conversation backfill
=====================

*This API is available since Fedify 2.3.0.*

Fedify provides the *@fedify/backfill* package for reconstructing ActivityPub
conversations that may be incomplete on the local server. It can retrieve
post-like objects from [FEP-f228] context collections and optionally crawl
`inReplyTo` ancestors and `replies` descendants.

[FEP-f228]: https://w3id.org/fep/f228


Installation
------------

::: code-group

~~~~ sh [Deno]
deno add jsr:@fedify/backfill
~~~~

~~~~ sh [npm]
npm add @fedify/backfill
~~~~

~~~~ sh [pnpm]
pnpm add @fedify/backfill
~~~~

~~~~ sh [Yarn]
yarn add @fedify/backfill
~~~~

~~~~ sh [Bun]
bun add @fedify/backfill
~~~~

:::


Backfilling a conversation
--------------------------

The `backfill()` function accepts a backfill context, a seed object, and
traversal options. The context supplies a `documentLoader` for dereferencing
context collections, collection items, reply targets, and replies collections:

~~~~ typescript twoslash
import { backfill, type BackfillDocumentLoader } from "@fedify/backfill";
import { lookupObject, Note } from "@fedify/vocab";

declare const note: Note;
// ---cut-before---
const documentLoader: BackfillDocumentLoader = (iri, options) =>
lookupObject(iri, { signal: options?.signal });

for await (
const item of backfill({ documentLoader }, note, {
maxItems: 20,
maxRequests: 50,
})
) {
console.log(item.id?.href);
}
~~~~

The seed object itself is not yielded. If the same object appears in a
discovered collection, it is skipped by ID.

By default, `backfill()` uses the `"context-auto"` strategy. It expects the
seed's `context` to dereference to a `Collection`, `OrderedCollection`,
`CollectionPage`, or `OrderedCollectionPage`. Ordinary post-like items are
yielded directly, while supported `Create` activities are unwrapped and their
objects are yielded.

If the seed has no context, or its context resolves to a non-collection,
context strategies yield nothing.


Strategies
----------

Strategies run in the configured order. They share request and item budgets,
abort state, document caching, and object ID deduplication. If multiple
strategies discover the same object, the first one keeps its `BackfillItem`
metadata.

`"context-auto"`
: Handles both direct post-like objects and supported `Create` activities
from a context collection. This is the default strategy.

`"context-objects"`
: Accepts only post-like objects contained directly in a context collection:

~~~~ typescript twoslash
import { backfill, type BackfillContext } from "@fedify/backfill";
import { Note } from "@fedify/vocab";

declare const context: BackfillContext;
declare const note: Note;
// ---cut-before---
for await (
const item of backfill(context, note, {
strategies: ["context-objects"],
})
) {
console.log(item.object);
}
~~~~

`"context-activities"`
: Accepts supported activities from a context collection. It currently
supports `Create` and yields the activity's object rather than the activity
itself:

~~~~ typescript twoslash
import { backfill, type BackfillContext } from "@fedify/backfill";
import { Note } from "@fedify/vocab";

declare const context: BackfillContext;
declare const note: Note;
// ---cut-before---
for await (
const item of backfill(context, note, {
strategies: ["context-activities"],
})
) {
console.log(item.object);
}
~~~~

`"reply-tree"`
: Walks `inReplyTo` ancestors and `replies` descendants. It yields
post-like objects only and does not unwrap Activity objects. This strategy
is opt-in because it can require substantially more network requests than
a context collection.

For hybrid coverage, run the FEP-f228 path first and use reply-tree traversal
after it:

~~~~ typescript twoslash
import { backfill, type BackfillContext } from "@fedify/backfill";
import { Note } from "@fedify/vocab";

declare const context: BackfillContext;
declare const note: Note;
// ---cut-before---
for await (
const item of backfill(context, note, {
strategies: ["context-auto", "reply-tree"],
maxDepth: 4,
})
) {
console.log(item.origin, item.depth, item.object);
}
~~~~


Traversal controls
------------------

`maxItems`
: Limits the number of yielded objects. Skipped duplicates do not count.

`maxRequests`
: Limits calls to `documentLoader`. Embedded objects and collections do not
count as requests.

`maxDepth`
: Limits reply-tree traversal and defaults to 10. Immediate parents and
direct replies have depth 1; their next-level parents or replies have depth
2, and so on. Context collection items have depth 0 and are not limited by
this option.

`interval`
: Adds a delay between `documentLoader` requests. A callback receives the
zero-based request index. String durations require the global `Temporal`
API or a polyfill; `Temporal.DurationLike` objects work without the global
API.

`signal`
: Cancels traversal before requests and yields. The signal is also passed to
`documentLoader`.


Caching and failures
--------------------

Dereferenced documents are cached in memory for one `backfill()` traversal.
Applications that need persistent or shared caching can implement it in the
provided `documentLoader`.

Failed external dereferences are skipped so other conversation items can still
be discovered. Failed loads are not retained in the traversal cache, allowing
the same IRI to be retried if another traversal path reaches it. Aborting the
provided signal stops traversal instead of skipping the request.
1 change: 1 addition & 0 deletions docs/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"@deno/kv": "^0.8.4",
"@fedify/amqp": "workspace:^",
"@fedify/astro": "workspace:^",
"@fedify/backfill": "workspace:^",
"@fedify/cfworkers": "workspace:^",
"@fedify/debugger": "workspace:^",
"@fedify/express": "workspace:^",
Expand Down
43 changes: 42 additions & 1 deletion packages/backfill/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

This package provides ActivityPub conversation backfill support for the
[Fedify] ecosystem. It can retrieve post-like objects from a seed object's
context collection, following the direct FEP-f228-style path where the
context collection, following the direct [FEP-f228] path where the
context dereferences to a `Collection`, `OrderedCollection`, `CollectionPage`,
or `OrderedCollectionPage`. It can also use an opt-in reply-tree strategy to
walk `inReplyTo` ancestors and `replies` descendants when context collections
Expand All @@ -24,6 +24,7 @@ are unavailable or incomplete.
[@fedify@hollo.social badge]: https://fedi-badge.deno.dev/@fedify@hollo.social/followers.svg
[@fedify@hollo.social]: https://hollo.social/@fedify
[Fedify]: https://fedify.dev/
[FEP-f228]: https://w3id.org/fep/f228


Installation
Expand Down Expand Up @@ -73,6 +74,19 @@ collection items are treated as backfillable objects by default. If an item is
recognized as a supported `Create` activity, `backfill()` extracts the
activity's object instead.

To accept only post-like objects directly contained in the context collection,
use the `context-objects` strategy:

~~~~ typescript
for await (
const item of backfill({ documentLoader }, note, {
strategies: ["context-objects"],
})
) {
console.log(item.object);
}
~~~~

To read only FEP-f228 activity collections, enable the `context-activities`
strategy:

Expand Down Expand Up @@ -109,3 +123,30 @@ objects from Activity wrappers. Immediate parents and direct replies have
depth 1, their next-level parents or replies have depth 2, and so on.
Reply-tree traversal defaults to a maximum depth of 10; set `maxDepth` to use a
different limit.


Traversal controls
------------------

All configured strategies share the same traversal controls:

- `maxItems` limits the number of yielded objects. Skipped duplicates do
not count.
- `maxRequests` limits calls to `documentLoader`. Embedded objects and
collections do not count.
- `maxDepth` limits reply-tree traversal and defaults to 10. It does not
limit context collection items.
- `interval` adds a delay between loader requests. Its callback receives
the zero-based request index.
- `signal` cancels traversal and is forwarded to `documentLoader`.

An `interval` string requires the global `Temporal` API or a polyfill.
`Temporal.DurationLike` objects work without the global API.

If the seed has no context, or its context resolves to a non-collection,
context strategies yield nothing. Loader failures are skipped unless
traversal is aborted.

Dereferenced documents are cached in memory for one `backfill()` traversal.
Applications that need persistent or shared caching can provide it through
the `documentLoader`.
4 changes: 2 additions & 2 deletions packages/backfill/src/backfill.ts
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ const DEFAULT_MAX_DEPTH = 10;
/**
* Thrown when backfill traversal exceeds the configured request budget.
*
* @since 2.x.0
* @since 2.3.0
*/
export class MaxRequestsExceeded extends Error {}

Expand Down Expand Up @@ -57,7 +57,7 @@ type ReplyTreeTraversal = {
* The seed object is not yielded by default, but its ID is treated as already
* seen so it will not be yielded again if the collection contains it.
*
* @since 2.x.0
* @since 2.3.0
*/
export async function* backfill<
TObject extends APObject = APObject,
Expand Down
Loading