Test fixes for many plugins by pirate · Pull Request #2 · ArchiveBox/abx-plugins

pirate · 2026-02-26T17:34:01Z

Summary by cubic

Stabilizes Chrome-driven plugins by consolidating session helpers and smoothing tab/navigate/screenshot/cookie flows. Test suite now opts into Chrome prerequisites, adds clear hook checks, and fixes flaky cases (including DNS tests using real URLs).

Bug Fixes
- Tests: ensure_chrome_test_prereqs is opt-in; modules explicitly use pytestmark/usefixtures, add small require_chrome_runtime fixtures, and raise FileNotFoundError when hooks are missing. TwoCaptcha tests now target the Recaptcha v2 demo, read API_KEY_2CAPTCHA as an alias, and warn when API keys are absent. DNS tests use a real URL.
- Chrome: tab/navigate/screenshot/cookies share helpers (readCdpUrl/waitForChromeSession/connectToPage/getCookiesViaCdp), with better waits/cleanup and extension target detection.
- Optional deps: requests/feedparser/sonic loaded via import_module with clearer errors; favicon imports requests directly. Papers-dl honors PAPERSDL_TIMEOUT, defaults to "fetch", and extracts arXiv IDs from DOIs. Misc trims ripgrep error text and stabilizes wget/sqlite/gallery-dl/ytdlp outputs and parse_rss expectations.
Refactors
- Chrome utils: helpers for session files, opening/closing tabs, extension ID parsing; screenshot waits on navigation state.
- SingleFile: extension save supports outputPath, snapshot hook uses cwd, and session dir resolution is fixed.
- Puppeteer install: reuses CHROME_BINARY when present and caches downloads via PUPPETEER_CACHE_DIR.
- ForumDL: remove wrapper and Pydantic monkeypatch; rely on install hook to pin deps; tests assert binary presence with clearer errors.
- Discovery/deps: README clarifies install hooks and removes binaries.jsonl; get_plugins_dir uses file; pyproject adds feedparser/requests and test tooling (pytest, pytest-httpserver, ruff, pyright).

^{Written for commit 2f09cbf. Summary will update on new commits.}

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 9 additional findings.

devin-ai-integration

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

devin-ai-integration · 2026-02-26T18:41:47Z

abx_plugins/plugins/singlefile/singlefile_extension_save.js

            chromeSessionDir: CHROME_SESSION_DIR,
            timeoutMs: 60000,
+            requireTargetId: false,
            puppeteer,


🔴 SingleFile save can target the wrong Chrome tab when target_id marker is missing

The SingleFile extension helper explicitly disables target-id enforcement when attaching to the shared browser session, so it can proceed even if the snapshot’s target_id.txt is absent or stale.

Root Cause

singlefile_extension_save.js now calls connectToPage(..., requireTargetId: false) (abx_plugins/plugins/singlefile/singlefile_extension_save.js:106). In connectToPage, if no target ID is available, it falls back to pages[pages.length - 1] (abx_plugins/plugins/chrome/chrome_utils.js:2008-2011).

That means a snapshot can attach to an arbitrary last-opened tab in the shared crawl browser instead of its own tab. The helper then may navigate that page and trigger SingleFile save on the wrong context.

Actual: missing/stale target marker can still produce a “successful” save from another tab.

Expected: fail fast unless the snapshot’s target id is present and matched.

Impact: incorrect/cross-snapshot capture (data integrity issue) in concurrent/shared Chrome runs.

Suggested change

puppeteer,

requireTargetId: true,

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration

Devin Review found 2 new potential issues.

View 26 additional findings in Devin Review.

devin-ai-integration · 2026-02-26T18:54:50Z

abx_plugins/plugins/forumdl/on_Snapshot__04_forumdl.bg.py

+    if forumdl_python:
+        # Inline compatibility shim so this hook stays self-contained.
+        inline_entrypoint = textwrap.dedent(
+            """
+            import sys
+            try:
+                from forum_dl.writers.jsonl import JsonlWriter
+                from pydantic import BaseModel
+                if hasattr(BaseModel, "model_dump_json"):
+                    def _patched_serialize_entry(self, entry):
+                        return entry.model_dump_json()
+                    JsonlWriter._serialize_entry = _patched_serialize_entry
+            except Exception:
+                pass
+            from forum_dl import main
+            raise SystemExit(main())
+            """
+        ).strip()
+        cmd = [forumdl_python, '-c', inline_entrypoint, *forumdl_args, '-f', output_format, '-o', str(output_file)]
    else:
        cmd = [resolved_binary, *forumdl_args, '-f', output_format, '-o', str(output_file)]

-    if not check_ssl:
-        cmd.append('--no-check-certificate')
-
    if forumdl_args_extra:
        cmd.extend(forumdl_args_extra)



🟡 Pydantic v2 compatibility shim dropped when binary shebang is unreadable

When get_binary_shebang(resolved_binary) returns None, the forum-dl Pydantic v2 compatibility patch is no longer applied. The old code fell back to sys.executable and always applied the wrapper shim:

Root Cause

The old code was:

forumdl_python = get_binary_shebang(resolved_binary) or sys.executable cmd = [forumdl_python, str(wrapper_path), *forumdl_args, ...]

The or sys.executable fallback ensured the Pydantic v2 monkey-patch was always applied (via the wrapper script) even when the shebang couldn't be extracted.

The new code at on_Snapshot__04_forumdl.bg.py:123-149 is:

forumdl_python = get_binary_shebang(resolved_binary) if forumdl_python: # ... inline shim with Pydantic v2 patch else: cmd = [resolved_binary, *forumdl_args, ...]

When forumdl_python is None (e.g., binary lacks read permissions, is a compiled executable, or has no shebang line), the else branch runs the binary directly without the Pydantic v2 compatibility shim. This means forum_dl.writers.jsonl.JsonlWriter._serialize_entry is never patched, and forum-dl 0.3.0 will crash with TypeError when serializing entries with Pydantic v2.

Impact: forum-dl extraction fails silently or with cryptic Pydantic errors when the binary shebang is unreadable, whereas before it always worked via the sys.executable fallback.

(Refers to lines 123-149)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-02-26T18:54:52Z

abx_plugins/plugins/singlefile/on_Snapshot__50_singlefile.py

+OUTPUT_DIR = Path.cwd().resolve()
 OUTPUT_DIR.mkdir(parents=True, exist_ok=True)


🔴 singlefile OUTPUT_DIR no longer self-computes from SNAP_DIR, silently writes to wrong directory if cwd is wrong

The singlefile snapshot hook changed from computing its output directory from SNAP_DIR to trusting cwd(). If the caller doesn't set cwd to SNAP_DIR/singlefile/, output files go to the wrong place and the Chrome session directory resolution breaks.

Detailed Explanation

The old code at on_Snapshot__50_singlefile.py was self-contained:

SNAP_DIR = Path(os.environ.get('SNAP_DIR', '.')).resolve() OUTPUT_DIR = SNAP_DIR / PLUGIN_DIR OUTPUT_DIR.mkdir(parents=True, exist_ok=True) os.chdir(OUTPUT_DIR)

This always computed the correct output directory from the SNAP_DIR environment variable and changed into it, regardless of the invocation cwd.

The new code is:

OUTPUT_DIR = Path.cwd().resolve() OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

This trusts the caller to set cwd to the plugin output directory. If the caller invokes from a different directory (e.g., SNAP_DIR instead of SNAP_DIR/singlefile/), then:

OUTPUT_FILE = 'singlefile.html' is written to the wrong location

STATICFILE_DIR = '../staticfile' resolves to the wrong sibling directory

CHROME_SESSION_DIR = '../chrome' resolves to the wrong directory, causing Chrome session lookup to fail

The singlefile_extension_save.js subprocess (invoked with cwd=str(OUTPUT_DIR)) will also have the wrong Chrome session path

Impact: If any caller doesn't set cwd to SNAP_DIR/<plugin>/ before invoking this hook, output silently goes to the wrong location and Chrome session connection fails.

Suggested change

OUTPUT_DIR = Path.cwd().resolve()

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

SNAP_DIR = Path(os.environ.get('SNAP_DIR', '.')).resolve()

OUTPUT_DIR = SNAP_DIR / PLUGIN_DIR

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

os.chdir(OUTPUT_DIR)

Was this helpful? React with 👍 or 👎 to provide feedback.

This comment was marked as resolved.

Sign in to view

devin-ai-integration bot reviewed Feb 26, 2026

View reviewed changes

lots of fixes

e743369

pirate force-pushed the refactors branch from c63db64 to e743369 Compare February 26, 2026 17:44

pirate added 6 commits February 26, 2026 09:47

cleanup readme

9c4caf5

more chrome util deduping

f2a5e1e

fix papersdl assertions

007c5ac

cleanup model_rebuilds

532baa2

cleanup model_rebuilds

fe96c9a

more test fixes

9fdfc71

This comment was marked as resolved.

Sign in to view

pirate added 3 commits February 26, 2026 10:21

more chrome utils and test improvements

57b4c74

more chrome utils and test improvements

35e552d

cleanup fixtures for pytest

5cb0866

This comment was marked as resolved.

Sign in to view

pirate added 2 commits February 26, 2026 10:27

explicitly add fixtures to tests that need them

94b748d

use real urls for dns test

b0a99f2

devin-ai-integration bot reviewed Feb 26, 2026

View reviewed changes

captcha test tweaks

2f09cbf

devin-ai-integration bot reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test fixes for many plugins#2

Test fixes for many plugins#2
pirate wants to merge 13 commits intomainfrom
refactors

pirate commented Feb 26, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Uh oh!

devin-ai-integration bot Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		OUTPUT_DIR = Path.cwd().resolve()
		OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

Conversation

pirate commented Feb 26, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pirate commented Feb 26, 2026 •

edited by cubic-dev-ai bot

Loading