Skip to content

v2.3.0#318

Open
rtuszik wants to merge 16 commits into
mainfrom
dev
Open

v2.3.0#318
rtuszik wants to merge 16 commits into
mainfrom
dev

Conversation

@rtuszik

@rtuszik rtuszik commented Apr 27, 2026

Copy link
Copy Markdown
Owner

TODO: Update README.md

@rtuszik rtuszik added this to the v2.3.0 milestone Apr 27, 2026
@github-actions

github-actions Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Coverage

Coverage Report
FileStmtsMissCoverMissing
src
   downloader.py3424387%102–103, 170–176, 193–198, 231–237, 247–248, 251–252, 373–374, 391, 401–412, 415–416, 420–422, 460–461, 509
   entrypoint.py80298%94–95
   filesystem.py1602485%36, 41–56, 120–127, 194–195
   process_manager.py2261195%157–158, 243–244, 251, 279–281, 320–322
   updater.py27293%36–37
src/utils
   config.py30390%34–36
   regions.py37295%93, 106
TOTAL10718792% 

Tests Skipped Failures Errors Time
200 0 💤 0 ❌ 0 🔥 1.446s ⏱️

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3a5ad1b6-2159-4ccf-9ace-d53083f0661a

📥 Commits

Reviewing files that changed from the base of the PR and between 90bcc71 and 4223bd8.

📒 Files selected for processing (8)
  • .github/workflows/full-test-jsonl.yml
  • .last_release
  • src/importer.py
  • src/process_manager.py
  • src/utils/regions.py
  • src/utils/validate_config.py
  • tests/test_process_manager.py
  • tests/utils/test_validate_config.py
✅ Files skipped from review due to trivial changes (1)
  • .last_release
🚧 Files skipped from review as they are similar to previous changes (7)
  • .github/workflows/full-test-jsonl.yml
  • src/utils/validate_config.py
  • tests/utils/test_validate_config.py
  • src/importer.py
  • src/process_manager.py
  • tests/test_process_manager.py
  • src/utils/regions.py

📝 Walkthrough

Walkthrough

Adds JSONL import mode, a filesystem-backed index lifecycle module, and a consolidated update pipeline; refactors entrypoint, process manager, downloader, and Docker build; updates CI workflows; and adds extensive tests covering new and refactored behaviours.

Changes

JSONL Import Mode and Index Management

Layer / File(s) Summary
Configuration and region metadata for JSONL support
src/utils/config.py, src/utils/regions.py
New env vars (IMPORT_MODE, LANGUAGES, EXTRA_TAGS, IMPORT_GEOMETRIES), JSONL constants, getters, region metadata extended with db_available/jsonl_available/country_codes, and JSONL helpers (get_jsonl_filename, get_jsonl_url_path, get_regions_for_jsonl, get_country_codes_for_regions, get_jsonl_parent_region).
Index lifecycle management
src/index.py, tests/test_index.py
New module providing markers, presence/timestamp queries, reconcile of interrupted imports, activate() with staging/backup and rollback, and drop_backup().
JSONL download and import orchestration
src/importer.py, src/jsonl/downloader.py, src/jsonl/decompressor.py, src/jsonl/__init__.py, tests/test_importer.py
Downloader builds region-specific JSONL URLs and downloads; decompressor streams zstd chunks; importer computes parent region/country codes, starts Photon import subprocess, streams decompressed JSONL to stdin, and tracks import markers.
Update pipeline extraction and refactoring
src/update.py, tests/test_update.py
New update module: download URL selection, remote-size lookup, disk-space check, index & MD5 download, lbzip2→tar extraction, checksum verification, activation, and an exception hierarchy (UpdateError, InsufficientSpaceError, DownloadError, ExtractionError, ChecksumMismatchError).
Configuration validation and remote check refactoring
src/utils/validate_config.py, src/check_remote.py
Mode-aware validation (db vs jsonl) and check_remote using index module for local timestamps/markers.
Entrypoint refactoring with unified setup and import
src/entrypoint.py, tests/test_entrypoint.py
New run_setup(), log_config(), and run_update_or_import() centralise startup, reconciliation, and conditional JSONL import or update flows.
Process manager integration with direct module calls
src/process_manager.py, tests/test_process_manager.py
Process manager now calls run_setup(), invokes update.run_update() directly, skips scheduling in JSONL mode, and adds resilient scheduler helper.
Downloader cleanup and supporting utilities
src/downloader.py, src/updater.py, src/utils/sanitize.py
Downloader drops high-level orchestration helpers, adds prepare/clear temp-dir functions, makes is_parallel keyword-only, re-raises local OSError, and updater/sanitize receive small control/format tweaks.

Infrastructure and Configuration

Layer / File(s) Summary
Docker multi-stage build and GitHub Actions
Dockerfile, .github/workflows/build-and-push.yml, .github/workflows/full-test.yml, .github/workflows/full-test-jsonl.yml, .github/workflows/lint.yml
Dockerfile converted to builder+runtime multi-stage (prebuilt venv), GitHub Actions docker actions bumped to v4, lint workflow runs per-job uv sync, and new full-test-jsonl workflow runs container health-checked tests.
Tool configuration and dependencies
pyproject.toml
Adds zstandard>=0.23.0, extends coverage run omit, and expands Ruff configuration and per-file ignores.
Utility test coverage
tests/utils/*, tests/jsonl/*, various module tests
Adds many pytest modules and test cases covering downloader, update, index, importer, process_manager, entrypoint, regions, notify, logger and JSONL downloader behaviour.

🎯 4 (Complex) | ⏱️ ~60 minutes

✨ Finishing Touches
⚔️ Resolve merge conflicts
  • Resolve merge conflict in branch dev

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
.github/workflows/lint.yml (1)

1-13: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Declare least-privilege GITHUB_TOKEN permissions for this workflow.

The workflow currently inherits default token permissions. These jobs only require read access, so explicitly scope permissions to reduce blast radius.

Suggested patch
 name: checks
+permissions:
+  contents: read
 on:

Source: Linters/SAST tools

src/process_manager.py (1)

82-86: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add timeouts for updater/setup subprocess calls to avoid indefinite hangs.

Line 82 and Line 225 invoke long-running external processes without timeout and without TimeoutExpired handling. A stuck child can block initialisation or update orchestration indefinitely.

Suggested patch
-        result = subprocess.run([sys.executable, "-m", "src.entrypoint", "setup"], check=False, cwd="/photon")  # noqa S603
+        try:
+            result = subprocess.run(
+                [sys.executable, "-m", "src.entrypoint", "setup"],
+                check=False,
+                cwd="/photon",
+                timeout=60 * 60,
+            )  # noqa S603
+        except subprocess.TimeoutExpired:
+            logger.error("Initial setup timed out")
+            sys.exit(1)
-        result = subprocess.run([sys.executable, "-m", "src.updater"], check=False, cwd="/photon")  # noqa S603
+        try:
+            result = subprocess.run(
+                [sys.executable, "-m", "src.updater"],
+                check=False,
+                cwd="/photon",
+                timeout=60 * 60,
+            )  # noqa S603
+        except subprocess.TimeoutExpired:
+            update_duration = time.time() - update_start
+            logger.error(f"Update process timed out ({update_duration:.1f}s)")
+            self.state = AppState.RUNNING
+            return

Also applies to: 225-256

🧹 Nitpick comments (3)
tests/test_updater.py (1)

46-58: 📐 Maintainability & Code Quality | ⚡ Quick win

Add an explicit JSONL-mode control-flow test in updater.main().

Current tests validate strategy branching and failure handling, but they do not exercise the JSONL early-exit path described for this PR. Please add a focused test to lock that branch behaviour and prevent regressions.

tests/jsonl/test_downloader.py (1)

5-6: 📐 Maintainability & Code Quality | ⚡ Quick win

Avoid hard-coding release-channel values in this URL assertion.

Line 6 hard-codes master and .jsonl.zst, so this test can fail after a valid config change. Build the expected filename from config constants instead.

Suggested patch
 def test_get_jsonl_url_uses_base_url():
-    assert get_jsonl_url("germany") == f"{config.BASE_URL}/europe/germany/photon-dump-germany-master-latest.jsonl.zst"
+    expected = (
+        f"{config.BASE_URL}/europe/germany/"
+        f"photon-dump-germany-{config.JSONL_RELEASE_CHANNEL}-latest.{config.JSONL_FILE_EXTENSION}"
+    )
+    assert get_jsonl_url("germany") == expected
tests/test_process_manager.py (1)

243-359: 📐 Maintainability & Code Quality | ⚡ Quick win

Add explicit tests for the newly added JSONL/reconcile control-flow branches.

Please add focused cases for:

  • run_update() returning early when IMPORT_MODE == "jsonl"
  • schedule_updates() skipping registration in JSONL mode
  • run() invoking reconcile_interrupted_import() before setup checks

This closes the coverage gap for newly introduced behaviour.

Also applies to: 400-442


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d20f2656-2a4f-435f-ba3b-cf349cca9f97

📥 Commits

Reviewing files that changed from the base of the PR and between 423b3cc and a476b80.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (29)
  • .github/workflows/build-and-push.yml
  • .github/workflows/full-test-jsonl.yml
  • .github/workflows/full-test.yml
  • .github/workflows/lint.yml
  • Dockerfile
  • pyproject.toml
  • src/entrypoint.py
  • src/filesystem.py
  • src/importer.py
  • src/jsonl/__init__.py
  • src/jsonl/decompressor.py
  • src/jsonl/downloader.py
  • src/process_manager.py
  • src/updater.py
  • src/utils/config.py
  • src/utils/regions.py
  • src/utils/validate_config.py
  • tests/jsonl/test_downloader.py
  • tests/test_check_remote.py
  • tests/test_downloader.py
  • tests/test_entrypoint.py
  • tests/test_filesystem.py
  • tests/test_importer.py
  • tests/test_process_manager.py
  • tests/test_updater.py
  • tests/utils/test_logger.py
  • tests/utils/test_notify.py
  • tests/utils/test_regions.py
  • tests/utils/test_validate_config.py

Comment thread .github/workflows/full-test-jsonl.yml Outdated
Comment thread Dockerfile
Comment on lines 69 to +70
ENTRYPOINT ["/bin/sh", "entrypoint.sh"]
CMD ["uv", "run", "-m", "src.process_manager"]
CMD ["/photon/.venv/bin/python", "-m", "src.process_manager"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Run the final container as the non-root user.

The runtime image still executes as root because the final stage never switches user. Add USER photon before startup to reduce container breakout impact.

Suggested patch
 RUN chmod 644 /photon/photon.jar && \
     chown -R photon:photon /photon
+
+USER photon
 
 LABEL org.opencontainers.image.title="photon-docker" \

Source: Linters/SAST tools

Comment thread src/utils/regions.py Outdated
Comment thread src/utils/validate_config.py
Comment thread tests/test_entrypoint.py
Comment on lines +26 to +27
def _patch_common():
return (patch("src.entrypoint.send_notification"), patch("src.entrypoint.validate_config"))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Stub reconcile_interrupted_import() in the common entrypoint test patch set.

Line 26 currently leaves reconcile_interrupted_import() live, so entrypoint.main() can read/remove real marker/index paths during tests. Patch it in _patch_common() to keep tests hermetic.

Suggested patch
 def _patch_common():
-    return (patch("src.entrypoint.send_notification"), patch("src.entrypoint.validate_config"))
+    return (
+        patch("src.entrypoint.send_notification"),
+        patch("src.entrypoint.validate_config"),
+        patch("src.entrypoint.reconcile_interrupted_import"),
+    )

Also applies to: 30-244

Comment thread tests/test_importer.py
Comment on lines +99 to +116
def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch):
process = RecordingProcess()
download_args = []

def fake_download(region):
download_args.append(region)
return "/photon/data/temp/europe.jsonl.zst"

monkeypatch.setattr(config, "REGION", "andorra,luxemburg")
monkeypatch.setattr(importer, "download_jsonl", fake_download)
monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n'])
monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process)
monkeypatch.setattr(importer, "clear_temp_dir", lambda: None)

importer.run_jsonl_import()

assert download_args == ["europe"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

This multi-region test does not currently verify country_codes forwarding.

The test name says country-code behaviour is validated, but it only asserts the parent download region. Capture _start_photon_import args and assert the expected country codes are passed.

Suggested patch
 def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch):
     process = RecordingProcess()
     download_args = []
+    import_args = []
@@
-    monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process)
+    def fake_start_import(input_source, country_codes=None):
+        import_args.append({"input_source": input_source, "country_codes": country_codes})
+        return process
+
+    monkeypatch.setattr(importer, "_start_photon_import", fake_start_import)
@@
     importer.run_jsonl_import()
 
     assert download_args == ["europe"]
+    assert set(import_args[0]["country_codes"]) == {"AD", "LU"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch):
process = RecordingProcess()
download_args = []
def fake_download(region):
download_args.append(region)
return "/photon/data/temp/europe.jsonl.zst"
monkeypatch.setattr(config, "REGION", "andorra,luxemburg")
monkeypatch.setattr(importer, "download_jsonl", fake_download)
monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n'])
monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process)
monkeypatch.setattr(importer, "clear_temp_dir", lambda: None)
importer.run_jsonl_import()
assert download_args == ["europe"]
def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch):
process = RecordingProcess()
download_args = []
import_args = []
def fake_download(region):
download_args.append(region)
return "/photon/data/temp/europe.jsonl.zst"
def fake_start_import(input_source, country_codes=None):
import_args.append({"input_source": input_source, "country_codes": country_codes})
return process
monkeypatch.setattr(config, "REGION", "andorra,luxemburg")
monkeypatch.setattr(importer, "download_jsonl", fake_download)
monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n'])
monkeypatch.setattr(importer, "_start_photon_import", fake_start_import)
monkeypatch.setattr(importer, "clear_temp_dir", lambda: None)
importer.run_jsonl_import()
assert download_args == ["europe"]
assert set(import_args[0]["country_codes"]) == {"AD", "LU"}

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_process_manager.py (1)

31-33: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Pin IMPORT_MODE in shared test setup to keep these tests hermetic.

Several tests here expect DB update/scheduling paths, but IMPORT_MODE is not normalised in the common fixture. If the runner environment sets JSONL mode, those tests can short-circuit and fail for the wrong reason.

Set config.IMPORT_MODE = "db" in an autouse/common fixture to remove env-coupling.

🧹 Nitpick comments (2)
src/update.py (2)

77-97: 💤 Low value

Consider using shlex.quote() to protect against paths with special characters.

While index_file and config.TEMP_DIR come from internal configuration rather than user input, paths containing spaces or shell metacharacters could cause unexpected behaviour or command parsing issues.

🔧 Suggested improvement
+import shlex
+
 def extract_index(index_file: str):
     logging.info("Extracting Index")
     logging.debug(f"Index file: {index_file}")

     if not os.path.exists(config.TEMP_DIR):
         os.makedirs(config.TEMP_DIR, exist_ok=True)

-    install_command = f"lbzip2 -d -c {index_file} | tar x -o -C {config.TEMP_DIR}"
+    install_command = f"lbzip2 -d -c {shlex.quote(index_file)} | tar x -o -C {shlex.quote(config.TEMP_DIR)}"
     logging.debug(f"Extraction command: {install_command}")

146-168: 💤 Low value

Temp directory not cleaned on pipeline failure.

If checksum verification or extraction fails, clear_temp_dir() is not called because the exception propagates before reaching line 166. This leaves potentially large files in the temp directory. Consider using a try/finally block or explicit cleanup in error paths if disk space is a concern.

🔧 Suggested improvement
 def run_update(strategy: str):
     logging.info(f"Starting {strategy.lower()} update pipeline...")

     prepare_temp_dir()

-    download_url = get_download_url()
-    _ensure_disk_space(download_url, parallel=strategy == "PARALLEL")
-
-    logging.info("Downloading index")
-    index_file = download_index()
-
-    extract_index(index_file)
-
-    if not config.SKIP_MD5_CHECK:
-        md5_file = download_md5()
-        logging.info("Verifying checksum...")
-        verify_checksum(md5_file, index_file)
-
-    logging.info("Activating new index")
-    index.activate(os.path.join(config.TEMP_DIR, "photon_data"))
-    clear_temp_dir()
+    try:
+        download_url = get_download_url()
+        _ensure_disk_space(download_url, parallel=strategy == "PARALLEL")
+
+        logging.info("Downloading index")
+        index_file = download_index()
+
+        extract_index(index_file)
+
+        if not config.SKIP_MD5_CHECK:
+            md5_file = download_md5()
+            logging.info("Verifying checksum...")
+            verify_checksum(md5_file, index_file)
+
+        logging.info("Activating new index")
+        index.activate(os.path.join(config.TEMP_DIR, "photon_data"))
+    finally:
+        clear_temp_dir()

     logging.info("Update pipeline completed successfully.")

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6421528-bcf7-4a3b-b052-53bebd7145ed

📥 Commits

Reviewing files that changed from the base of the PR and between a476b80 and 90bcc71.

📒 Files selected for processing (21)
  • pyproject.toml
  • src/__init__.py
  • src/check_remote.py
  • src/downloader.py
  • src/entrypoint.py
  • src/filesystem.py
  • src/importer.py
  • src/index.py
  • src/process_manager.py
  • src/update.py
  • src/updater.py
  • src/utils/sanitize.py
  • tests/jsonl/test_jsonl_downloader.py
  • tests/test_check_remote.py
  • tests/test_downloader.py
  • tests/test_entrypoint.py
  • tests/test_importer.py
  • tests/test_index.py
  • tests/test_process_manager.py
  • tests/test_update.py
  • tests/utils/test_validate_config.py
💤 Files with no reviewable changes (5)
  • tests/jsonl/test_jsonl_downloader.py
  • src/init.py
  • src/updater.py
  • src/filesystem.py
  • tests/test_check_remote.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/utils/test_validate_config.py
  • tests/test_importer.py

Comment thread src/importer.py
Comment thread src/process_manager.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant