Conversation
| chromeSessionDir: CHROME_SESSION_DIR, | ||
| timeoutMs: 60000, | ||
| requireTargetId: false, | ||
| puppeteer, |
There was a problem hiding this comment.
🔴 SingleFile save can target the wrong Chrome tab when target_id marker is missing
The SingleFile extension helper explicitly disables target-id enforcement when attaching to the shared browser session, so it can proceed even if the snapshot’s target_id.txt is absent or stale.
Root Cause
singlefile_extension_save.js now calls connectToPage(..., requireTargetId: false) (abx_plugins/plugins/singlefile/singlefile_extension_save.js:106). In connectToPage, if no target ID is available, it falls back to pages[pages.length - 1] (abx_plugins/plugins/chrome/chrome_utils.js:2008-2011).
That means a snapshot can attach to an arbitrary last-opened tab in the shared crawl browser instead of its own tab. The helper then may navigate that page and trigger SingleFile save on the wrong context.
Actual: missing/stale target marker can still produce a “successful” save from another tab.
Expected: fail fast unless the snapshot’s target id is present and matched.
Impact: incorrect/cross-snapshot capture (data integrity issue) in concurrent/shared Chrome runs.
| puppeteer, | |
| requireTargetId: true, |
Was this helpful? React with 👍 or 👎 to provide feedback.
| if forumdl_python: | ||
| # Inline compatibility shim so this hook stays self-contained. | ||
| inline_entrypoint = textwrap.dedent( | ||
| """ | ||
| import sys | ||
| try: | ||
| from forum_dl.writers.jsonl import JsonlWriter | ||
| from pydantic import BaseModel | ||
| if hasattr(BaseModel, "model_dump_json"): | ||
| def _patched_serialize_entry(self, entry): | ||
| return entry.model_dump_json() | ||
| JsonlWriter._serialize_entry = _patched_serialize_entry | ||
| except Exception: | ||
| pass | ||
| from forum_dl import main | ||
| raise SystemExit(main()) | ||
| """ | ||
| ).strip() | ||
| cmd = [forumdl_python, '-c', inline_entrypoint, *forumdl_args, '-f', output_format, '-o', str(output_file)] | ||
| else: | ||
| cmd = [resolved_binary, *forumdl_args, '-f', output_format, '-o', str(output_file)] | ||
|
|
||
| if not check_ssl: | ||
| cmd.append('--no-check-certificate') | ||
|
|
||
| if forumdl_args_extra: | ||
| cmd.extend(forumdl_args_extra) | ||
|
|
There was a problem hiding this comment.
🟡 Pydantic v2 compatibility shim dropped when binary shebang is unreadable
When get_binary_shebang(resolved_binary) returns None, the forum-dl Pydantic v2 compatibility patch is no longer applied. The old code fell back to sys.executable and always applied the wrapper shim:
Root Cause
The old code was:
forumdl_python = get_binary_shebang(resolved_binary) or sys.executable
cmd = [forumdl_python, str(wrapper_path), *forumdl_args, ...]The or sys.executable fallback ensured the Pydantic v2 monkey-patch was always applied (via the wrapper script) even when the shebang couldn't be extracted.
The new code at on_Snapshot__04_forumdl.bg.py:123-149 is:
forumdl_python = get_binary_shebang(resolved_binary)
if forumdl_python:
# ... inline shim with Pydantic v2 patch
else:
cmd = [resolved_binary, *forumdl_args, ...]When forumdl_python is None (e.g., binary lacks read permissions, is a compiled executable, or has no shebang line), the else branch runs the binary directly without the Pydantic v2 compatibility shim. This means forum_dl.writers.jsonl.JsonlWriter._serialize_entry is never patched, and forum-dl 0.3.0 will crash with TypeError when serializing entries with Pydantic v2.
Impact: forum-dl extraction fails silently or with cryptic Pydantic errors when the binary shebang is unreadable, whereas before it always worked via the sys.executable fallback.
(Refers to lines 123-149)
Was this helpful? React with 👍 or 👎 to provide feedback.
| OUTPUT_DIR = Path.cwd().resolve() | ||
| OUTPUT_DIR.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
🔴 singlefile OUTPUT_DIR no longer self-computes from SNAP_DIR, silently writes to wrong directory if cwd is wrong
The singlefile snapshot hook changed from computing its output directory from SNAP_DIR to trusting cwd(). If the caller doesn't set cwd to SNAP_DIR/singlefile/, output files go to the wrong place and the Chrome session directory resolution breaks.
Detailed Explanation
The old code at on_Snapshot__50_singlefile.py was self-contained:
SNAP_DIR = Path(os.environ.get('SNAP_DIR', '.')).resolve()
OUTPUT_DIR = SNAP_DIR / PLUGIN_DIR
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
os.chdir(OUTPUT_DIR)This always computed the correct output directory from the SNAP_DIR environment variable and changed into it, regardless of the invocation cwd.
The new code is:
OUTPUT_DIR = Path.cwd().resolve()
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)This trusts the caller to set cwd to the plugin output directory. If the caller invokes from a different directory (e.g., SNAP_DIR instead of SNAP_DIR/singlefile/), then:
OUTPUT_FILE = 'singlefile.html'is written to the wrong locationSTATICFILE_DIR = '../staticfile'resolves to the wrong sibling directoryCHROME_SESSION_DIR = '../chrome'resolves to the wrong directory, causing Chrome session lookup to fail- The
singlefile_extension_save.jssubprocess (invoked withcwd=str(OUTPUT_DIR)) will also have the wrong Chrome session path
Impact: If any caller doesn't set cwd to SNAP_DIR/<plugin>/ before invoking this hook, output silently goes to the wrong location and Chrome session connection fails.
| OUTPUT_DIR = Path.cwd().resolve() | |
| OUTPUT_DIR.mkdir(parents=True, exist_ok=True) | |
| SNAP_DIR = Path(os.environ.get('SNAP_DIR', '.')).resolve() | |
| OUTPUT_DIR = SNAP_DIR / PLUGIN_DIR | |
| OUTPUT_DIR.mkdir(parents=True, exist_ok=True) | |
| os.chdir(OUTPUT_DIR) |
Was this helpful? React with 👍 or 👎 to provide feedback.
Summary by cubic
Stabilizes Chrome-driven plugins by consolidating session helpers and smoothing tab/navigate/screenshot/cookie flows. Test suite now opts into Chrome prerequisites, adds clear hook checks, and fixes flaky cases (including DNS tests using real URLs).
Bug Fixes
Refactors
Written for commit 2f09cbf. Summary will update on new commits.