Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
e743369
lots of fixes
pirate Feb 26, 2026
9c4caf5
cleanup readme
pirate Feb 26, 2026
f2a5e1e
more chrome util deduping
pirate Feb 26, 2026
007c5ac
fix papersdl assertions
pirate Feb 26, 2026
532baa2
cleanup model_rebuilds
pirate Feb 26, 2026
fe96c9a
cleanup model_rebuilds
pirate Feb 26, 2026
9fdfc71
more test fixes
pirate Feb 26, 2026
57b4c74
more chrome utils and test improvements
pirate Feb 26, 2026
35e552d
more chrome utils and test improvements
pirate Feb 26, 2026
5cb0866
cleanup fixtures for pytest
pirate Feb 26, 2026
94b748d
explicitly add fixtures to tests that need them
pirate Feb 26, 2026
b0a99f2
use real urls for dns test
pirate Feb 26, 2026
2f09cbf
captcha test tweaks
pirate Feb 26, 2026
54f3b11
test fixes
pirate Feb 28, 2026
2167523
format
pirate Feb 28, 2026
75218bc
Update abx_plugins/plugins/gallerydl/tests/test_gallerydl.py
pirate Feb 28, 2026
170a39f
Update abx_plugins/plugins/singlefile/on_Snapshot__50_singlefile.py
pirate Feb 28, 2026
45cb68b
Update abx_plugins/plugins/singlefile/singlefile_extension_save.js
pirate Feb 28, 2026
1048604
Merge branch 'main' into refactors
pirate Feb 28, 2026
b38fefc
cubic fixes
pirate Feb 28, 2026
617333b
fix parallel tests
pirate Feb 28, 2026
80bebe0
fix missing dir and replace requests with stdlib
pirate Feb 28, 2026
bf20563
fix hooks and abx-pkg version
pirate Feb 28, 2026
7c32880
fix python version
pirate Feb 28, 2026
2a335cd
bump python version
pirate Feb 28, 2026
55415ca
bump plugins version
pirate Feb 28, 2026
59758bc
env fixes for tests
pirate Feb 28, 2026
16154c0
more test fixes
pirate Feb 28, 2026
399ab47
test fixes
pirate Feb 28, 2026
d1f3f29
env var fixes
pirate Feb 28, 2026
80bacc4
make more tests static
pirate Feb 28, 2026
843ae52
more fixes
pirate Feb 28, 2026
558fc30
mercury improvement
pirate Feb 28, 2026
1baa20b
formatting
pirate Feb 28, 2026
729c0a5
fix wget and headers
pirate Feb 28, 2026
8596571
fix seo test determinism
pirate Feb 28, 2026
092fbc6
fix tests
pirate Feb 28, 2026
a5c0360
more consolidation of plugin chrome uitls
pirate Feb 28, 2026
b6e1fbf
test fixes
pirate Feb 28, 2026
eab1f72
more consolidation of plugin chrome uitls
pirate Mar 1, 2026
78f0285
fix timeout
pirate Mar 1, 2026
b1538c1
more extension fixes
pirate Mar 1, 2026
4566301
fix timeout probe
pirate Mar 1, 2026
ac85528
make ytdlp test deterministic
pirate Mar 1, 2026
d69d969
Update abx_plugins/plugins/favicon/on_Snapshot__11_favicon.bg.py
pirate Mar 1, 2026
ccdbe3f
cubic comments
pirate Mar 1, 2026
91548aa
lint fixes
pirate Mar 1, 2026
f47ab41
fix missing import
pirate Mar 1, 2026
95839b3
fix race on chrome tab setup
pirate Mar 1, 2026
0cff700
allow env provider for wget test
pirate Mar 1, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 75 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# abx-plugins

ArchiveBox-compatible plugin suite (hooks, config schemas, binaries manifests).
ArchiveBox-compatible plugin suite (hooks and config schemas).

This package contains only plugin assets and a tiny helper to locate them.
It does **not** depend on Django or ArchiveBox.
Expand All @@ -11,7 +11,7 @@ It does **not** depend on Django or ArchiveBox.
from abx_plugins import get_plugins_dir

plugins_dir = get_plugins_dir()
# scan plugins_dir for plugins/*/config.json, binaries.jsonl, on_* hooks
# scan plugins_dir for plugins/*/config.json and on_* hooks
```

Tools like `abx-dl` and ArchiveBox can discover plugins from this package
Expand All @@ -24,7 +24,7 @@ without symlinks or environment-variable tricks.
Each plugin lives under `plugins/<name>/` and may include:

- `config.json` (optional) - config schema
- `binaries.jsonl` (optional) - binary manifests
- `on_Crawl*install*` hooks (optional) - dependency/binary install records
- `on_*` hook scripts (required to do work)

Hooks run with:
Expand All @@ -43,6 +43,78 @@ Hooks run with:
- `PERSONAS_DIR` - persona profiles root (default: `~/.config/abx/personas`)
- `ACTIVE_PERSONA` - persona name (default: `Default`)

### Install hook contract (concise)

Lifecycle:

1. `on_Crawl__*install*` declares crawl dependencies.
2. `on_Binary__*install*` resolves/installs one binary with one provider.

`on_Crawl` output (dependency declaration):

```json
{"type":"Binary","name":"yt-dlp","binproviders":"pip,brew,apt,env","overrides":{"pip":{"packages":["yt-dlp[default]"]}},"machine_id":"<optional>"}
```

`on_Binary` input/output:

- CLI input should accept `--binary-id`, `--machine-id`, `--name` (plus optional provider args).
- Output should emit installed facts like:

```json
{"type":"Binary","name":"yt-dlp","abspath":"/abs/path","version":"2025.01.01","sha256":"<optional>","binprovider":"pip","machine_id":"<recommended>","binary_id":"<recommended>"}
```

Optional machine patch record:

```json
{"type":"Machine","config":{"PATH":"...","NODE_MODULES_DIR":"...","CHROME_BINARY":"..."}}
```

Semantics:

- `stdout`: JSONL records only
- `stderr`: human logs/debug
- exit `0`: success or intentional skip
- exit non-zero: hard failure

State/OS:

- working dir: `CRAWL_DIR/<plugin>/`
- durable install root: `LIB_DIR` (e.g. npm prefix, pip venv, puppeteer cache)
- providers: `apt` (Debian/Ubuntu), `brew` (macOS/Linux), many hooks currently assume POSIX paths

### Snapshot hook contract (concise)

Lifecycle:

- runs once per snapshot, typically after crawl setup
- common Chrome flow: crawl browser/session -> `chrome_tab` -> `chrome_navigate` -> downstream extractors

State:

- output cwd is usually `SNAP_DIR/<plugin>/`
- hooks may read sibling outputs via `../<plugin>/...`

Output records:

- terminal record is usually:

```json
{"type":"ArchiveResult","status":"succeeded|skipped|failed","output_str":"path-or-message"}
```

- discovery hooks may also emit `Snapshot` and `Tag` records before `ArchiveResult`
- search indexing hooks are a known exception and may use exit code + stderr without `ArchiveResult`

Semantics:

- `stdout`: JSONL records
- `stderr`: diagnostics/logging
- exit `0`: succeeded or skipped
- exit non-zero: failed
- current nuance: some skip/transient paths emit no JSONL and rely only on exit code

### Event JSONL interface (bbus-style, no dependency)

Hooks emit JSONL events to stdout. They do **not** need to import `bbus`.
Expand Down
3 changes: 1 addition & 2 deletions abx_plugins/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@
from __future__ import annotations

from pathlib import Path
from importlib import resources


def get_plugins_dir() -> Path:
"""Return the filesystem path to the bundled plugins directory."""
return Path(resources.files(__name__) / "plugins")
return Path(__file__).resolve().parent / "plugins"


__all__ = ["get_plugins_dir"]
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,13 @@

import pytest

pytestmark = pytest.mark.usefixtures("ensure_chrome_test_prereqs")

from abx_plugins.plugins.chrome.tests.chrome_test_helpers import (
chrome_session,
get_test_env,
get_plugin_dir,
get_hook_script,
chrome_test_url,
)


Expand Down
5 changes: 1 addition & 4 deletions abx_plugins/plugins/apt/on_Binary__13_apt_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,7 @@
import sys

import rich_click as click
from abx_pkg import Binary, AptProvider, BinProviderOverrides

# Fix pydantic forward reference issue
AptProvider.model_rebuild()
from abx_pkg import AptProvider, Binary


@click.command()
Expand Down
1 change: 0 additions & 1 deletion abx_plugins/plugins/apt/tests/test_apt_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
"""

import json
import os
import shutil
import subprocess
import sys
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
import json
import os
import sys
from importlib import import_module
from pathlib import Path
from typing import Any

import rich_click as click

Expand Down Expand Up @@ -51,8 +53,8 @@ def log(message: str) -> None:
print(f'[archivedotorg] {message}', file=sys.stderr)

try:
import requests
except ImportError:
requests: Any = import_module('requests')
except ModuleNotFoundError:
return False, None, 'requests library not installed'

timeout = get_env_int('ARCHIVEDOTORG_TIMEOUT') or get_env_int('TIMEOUT', 60)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@
import pytest

PLUGIN_DIR = Path(__file__).parent.parent
ARCHIVEDOTORG_HOOK = next(PLUGIN_DIR.glob('on_Snapshot__*_archivedotorg.*'), None)
_ARCHIVEDOTORG_HOOK = next(PLUGIN_DIR.glob('on_Snapshot__*_archivedotorg.*'), None)
if _ARCHIVEDOTORG_HOOK is None:
raise FileNotFoundError(f"Hook not found in {PLUGIN_DIR}")
ARCHIVEDOTORG_HOOK = _ARCHIVEDOTORG_HOOK
TEST_URL = 'https://example.com'

def test_hook_script_exists():
Expand Down
5 changes: 1 addition & 4 deletions abx_plugins/plugins/brew/on_Binary__12_brew_install.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,7 @@
import sys

import rich_click as click
from abx_pkg import Binary, BrewProvider, BinProviderOverrides

# Fix pydantic forward reference issue
BrewProvider.model_rebuild()
from abx_pkg import Binary, BrewProvider


@click.command()
Expand Down
Loading
Loading