Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 75 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# abx-plugins

ArchiveBox-compatible plugin suite (hooks, config schemas, binaries manifests).
ArchiveBox-compatible plugin suite (hooks and config schemas).

This package contains only plugin assets and a tiny helper to locate them.
It does **not** depend on Django or ArchiveBox.
Expand All @@ -11,7 +11,7 @@ It does **not** depend on Django or ArchiveBox.
from abx_plugins import get_plugins_dir

plugins_dir = get_plugins_dir()
# scan plugins_dir for plugins/*/config.json, binaries.jsonl, on_* hooks
# scan plugins_dir for plugins/*/config.json and on_* hooks
```

Tools like `abx-dl` and ArchiveBox can discover plugins from this package
Expand All @@ -24,7 +24,7 @@ without symlinks or environment-variable tricks.
Each plugin lives under `plugins/<name>/` and may include:

- `config.json` (optional) - config schema
- `binaries.jsonl` (optional) - binary manifests
- `on_Crawl*install*` hooks (optional) - dependency/binary install records
- `on_*` hook scripts (required to do work)

Hooks run with:
Expand All @@ -43,6 +43,78 @@ Hooks run with:
- `PERSONAS_DIR` - persona profiles root (default: `~/.config/abx/personas`)
- `ACTIVE_PERSONA` - persona name (default: `Default`)

### Install hook contract (concise)

Lifecycle:

1. `on_Crawl__*install*` declares crawl dependencies.
2. `on_Binary__*install*` resolves/installs one binary with one provider.

`on_Crawl` output (dependency declaration):

```json
{"type":"Binary","name":"yt-dlp","binproviders":"pip,brew,apt,env","overrides":{"pip":{"packages":["yt-dlp[default]"]}},"machine_id":"<optional>"}
```

`on_Binary` input/output:

- CLI input should accept `--binary-id`, `--machine-id`, `--name` (plus optional provider args).
- Output should emit installed facts like:

```json
{"type":"Binary","name":"yt-dlp","abspath":"/abs/path","version":"2025.01.01","sha256":"<optional>","binprovider":"pip","machine_id":"<recommended>","binary_id":"<recommended>"}
```

Optional machine patch record:

```json
{"type":"Machine","config":{"PATH":"...","NODE_MODULES_DIR":"...","CHROME_BINARY":"..."}}
```

Semantics:

- `stdout`: JSONL records only
- `stderr`: human logs/debug
- exit `0`: success or intentional skip
- exit non-zero: hard failure

State/OS:

- working dir: `CRAWL_DIR/<plugin>/`
- durable install root: `LIB_DIR` (e.g. npm prefix, pip venv, puppeteer cache)
- providers: `apt` (Debian/Ubuntu), `brew` (macOS/Linux), many hooks currently assume POSIX paths

### Snapshot hook contract (concise)

Lifecycle:

- runs once per snapshot, typically after crawl setup
- common Chrome flow: crawl browser/session -> `chrome_tab` -> `chrome_navigate` -> downstream extractors

State:

- output cwd is usually `SNAP_DIR/<plugin>/`
- hooks may read sibling outputs via `../<plugin>/...`

Output records:

- terminal record is usually:

```json
{"type":"ArchiveResult","status":"succeeded|skipped|failed","output_str":"path-or-message"}
```

- discovery hooks may also emit `Snapshot` and `Tag` records before `ArchiveResult`
- search indexing hooks are a known exception and may use exit code + stderr without `ArchiveResult`

Semantics:

- `stdout`: JSONL records
- `stderr`: diagnostics/logging
- exit `0`: succeeded or skipped
- exit non-zero: failed
- current nuance: some skip/transient paths emit no JSONL and rely only on exit code

### Event JSONL interface (bbus-style, no dependency)

Hooks emit JSONL events to stdout. They do **not** need to import `bbus`.
Expand Down
3 changes: 1 addition & 2 deletions abx_plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
from __future__ import annotations

from pathlib import Path
from importlib import resources


def get_plugins_dir() -> Path:
"""Return the filesystem path to the bundled plugins directory."""
return Path(resources.files(__name__) / "plugins")
return Path(__file__).resolve().parent / "plugins"


__all__ = ["get_plugins_dir"]
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@

import pytest

pytestmark = pytest.mark.usefixtures("ensure_chrome_test_prereqs")

from abx_plugins.plugins.chrome.tests.chrome_test_helpers import (
chrome_session,
get_test_env,
get_plugin_dir,
get_hook_script,
chrome_test_url,
)


Expand Down
5 changes: 1 addition & 4 deletions abx_plugins/plugins/apt/on_Binary__13_apt_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,7 @@
import sys

import rich_click as click
from abx_pkg import Binary, AptProvider, BinProviderOverrides

# Fix pydantic forward reference issue
AptProvider.model_rebuild()
from abx_pkg import AptProvider, Binary


@click.command()
Expand Down
1 change: 0 additions & 1 deletion abx_plugins/plugins/apt/tests/test_apt_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"""

import json
import os
import shutil
import subprocess
import sys
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
import json
import os
import sys
from importlib import import_module
from pathlib import Path
from typing import Any

import rich_click as click

Expand Down Expand Up @@ -51,8 +53,8 @@ def log(message: str) -> None:
print(f'[archivedotorg] {message}', file=sys.stderr)

try:
import requests
except ImportError:
requests: Any = import_module('requests')
except ModuleNotFoundError:
return False, None, 'requests library not installed'

timeout = get_env_int('ARCHIVEDOTORG_TIMEOUT') or get_env_int('TIMEOUT', 60)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
import pytest

PLUGIN_DIR = Path(__file__).parent.parent
ARCHIVEDOTORG_HOOK = next(PLUGIN_DIR.glob('on_Snapshot__*_archivedotorg.*'), None)
_ARCHIVEDOTORG_HOOK = next(PLUGIN_DIR.glob('on_Snapshot__*_archivedotorg.*'), None)
if _ARCHIVEDOTORG_HOOK is None:
raise FileNotFoundError(f"Hook not found in {PLUGIN_DIR}")
ARCHIVEDOTORG_HOOK = _ARCHIVEDOTORG_HOOK
TEST_URL = 'https://example.com'

def test_hook_script_exists():
Expand Down
5 changes: 1 addition & 4 deletions abx_plugins/plugins/brew/on_Binary__12_brew_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,7 @@
import sys

import rich_click as click
from abx_pkg import Binary, BrewProvider, BinProviderOverrides

# Fix pydantic forward reference issue
BrewProvider.model_rebuild()
from abx_pkg import Binary, BrewProvider


@click.command()
Expand Down
Loading
Loading