Conversation
Coverage Report
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (7)
📝 WalkthroughWalkthroughAdds JSONL import mode, a filesystem-backed index lifecycle module, and a consolidated update pipeline; refactors entrypoint, process manager, downloader, and Docker build; updates CI workflows; and adds extensive tests covering new and refactored behaviours. ChangesJSONL Import Mode and Index Management
Infrastructure and Configuration
🎯 4 (Complex) | ⏱️ ~60 minutes ✨ Finishing Touches⚔️ Resolve merge conflicts
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/lint.yml (1)
1-13: 🔒 Security & Privacy | 🟠 Major | ⚡ Quick winDeclare least-privilege
GITHUB_TOKENpermissions for this workflow.The workflow currently inherits default token permissions. These jobs only require read access, so explicitly scope permissions to reduce blast radius.
Suggested patch
name: checks +permissions: + contents: read on:Source: Linters/SAST tools
src/process_manager.py (1)
82-86: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick winAdd timeouts for updater/setup subprocess calls to avoid indefinite hangs.
Line 82 and Line 225 invoke long-running external processes without
timeoutand withoutTimeoutExpiredhandling. A stuck child can block initialisation or update orchestration indefinitely.Suggested patch
- result = subprocess.run([sys.executable, "-m", "src.entrypoint", "setup"], check=False, cwd="/photon") # noqa S603 + try: + result = subprocess.run( + [sys.executable, "-m", "src.entrypoint", "setup"], + check=False, + cwd="/photon", + timeout=60 * 60, + ) # noqa S603 + except subprocess.TimeoutExpired: + logger.error("Initial setup timed out") + sys.exit(1)- result = subprocess.run([sys.executable, "-m", "src.updater"], check=False, cwd="/photon") # noqa S603 + try: + result = subprocess.run( + [sys.executable, "-m", "src.updater"], + check=False, + cwd="/photon", + timeout=60 * 60, + ) # noqa S603 + except subprocess.TimeoutExpired: + update_duration = time.time() - update_start + logger.error(f"Update process timed out ({update_duration:.1f}s)") + self.state = AppState.RUNNING + returnAlso applies to: 225-256
🧹 Nitpick comments (3)
tests/test_updater.py (1)
46-58: 📐 Maintainability & Code Quality | ⚡ Quick winAdd an explicit JSONL-mode control-flow test in
updater.main().Current tests validate strategy branching and failure handling, but they do not exercise the JSONL early-exit path described for this PR. Please add a focused test to lock that branch behaviour and prevent regressions.
tests/jsonl/test_downloader.py (1)
5-6: 📐 Maintainability & Code Quality | ⚡ Quick winAvoid hard-coding release-channel values in this URL assertion.
Line 6 hard-codes
masterand.jsonl.zst, so this test can fail after a valid config change. Build the expected filename from config constants instead.Suggested patch
def test_get_jsonl_url_uses_base_url(): - assert get_jsonl_url("germany") == f"{config.BASE_URL}/europe/germany/photon-dump-germany-master-latest.jsonl.zst" + expected = ( + f"{config.BASE_URL}/europe/germany/" + f"photon-dump-germany-{config.JSONL_RELEASE_CHANNEL}-latest.{config.JSONL_FILE_EXTENSION}" + ) + assert get_jsonl_url("germany") == expectedtests/test_process_manager.py (1)
243-359: 📐 Maintainability & Code Quality | ⚡ Quick winAdd explicit tests for the newly added JSONL/reconcile control-flow branches.
Please add focused cases for:
run_update()returning early whenIMPORT_MODE == "jsonl"schedule_updates()skipping registration in JSONL moderun()invokingreconcile_interrupted_import()before setup checksThis closes the coverage gap for newly introduced behaviour.
Also applies to: 400-442
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d20f2656-2a4f-435f-ba3b-cf349cca9f97
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (29)
.github/workflows/build-and-push.yml.github/workflows/full-test-jsonl.yml.github/workflows/full-test.yml.github/workflows/lint.ymlDockerfilepyproject.tomlsrc/entrypoint.pysrc/filesystem.pysrc/importer.pysrc/jsonl/__init__.pysrc/jsonl/decompressor.pysrc/jsonl/downloader.pysrc/process_manager.pysrc/updater.pysrc/utils/config.pysrc/utils/regions.pysrc/utils/validate_config.pytests/jsonl/test_downloader.pytests/test_check_remote.pytests/test_downloader.pytests/test_entrypoint.pytests/test_filesystem.pytests/test_importer.pytests/test_process_manager.pytests/test_updater.pytests/utils/test_logger.pytests/utils/test_notify.pytests/utils/test_regions.pytests/utils/test_validate_config.py
| ENTRYPOINT ["/bin/sh", "entrypoint.sh"] | ||
| CMD ["uv", "run", "-m", "src.process_manager"] | ||
| CMD ["/photon/.venv/bin/python", "-m", "src.process_manager"] |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Run the final container as the non-root user.
The runtime image still executes as root because the final stage never switches user. Add USER photon before startup to reduce container breakout impact.
Suggested patch
RUN chmod 644 /photon/photon.jar && \
chown -R photon:photon /photon
+
+USER photon
LABEL org.opencontainers.image.title="photon-docker" \Source: Linters/SAST tools
| def _patch_common(): | ||
| return (patch("src.entrypoint.send_notification"), patch("src.entrypoint.validate_config")) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Stub reconcile_interrupted_import() in the common entrypoint test patch set.
Line 26 currently leaves reconcile_interrupted_import() live, so entrypoint.main() can read/remove real marker/index paths during tests. Patch it in _patch_common() to keep tests hermetic.
Suggested patch
def _patch_common():
- return (patch("src.entrypoint.send_notification"), patch("src.entrypoint.validate_config"))
+ return (
+ patch("src.entrypoint.send_notification"),
+ patch("src.entrypoint.validate_config"),
+ patch("src.entrypoint.reconcile_interrupted_import"),
+ )Also applies to: 30-244
| def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch): | ||
| process = RecordingProcess() | ||
| download_args = [] | ||
|
|
||
| def fake_download(region): | ||
| download_args.append(region) | ||
| return "/photon/data/temp/europe.jsonl.zst" | ||
|
|
||
| monkeypatch.setattr(config, "REGION", "andorra,luxemburg") | ||
| monkeypatch.setattr(importer, "download_jsonl", fake_download) | ||
| monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n']) | ||
| monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process) | ||
| monkeypatch.setattr(importer, "clear_temp_dir", lambda: None) | ||
|
|
||
| importer.run_jsonl_import() | ||
|
|
||
| assert download_args == ["europe"] | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
This multi-region test does not currently verify country_codes forwarding.
The test name says country-code behaviour is validated, but it only asserts the parent download region. Capture _start_photon_import args and assert the expected country codes are passed.
Suggested patch
def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch):
process = RecordingProcess()
download_args = []
+ import_args = []
@@
- monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process)
+ def fake_start_import(input_source, country_codes=None):
+ import_args.append({"input_source": input_source, "country_codes": country_codes})
+ return process
+
+ monkeypatch.setattr(importer, "_start_photon_import", fake_start_import)
@@
importer.run_jsonl_import()
assert download_args == ["europe"]
+ assert set(import_args[0]["country_codes"]) == {"AD", "LU"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch): | |
| process = RecordingProcess() | |
| download_args = [] | |
| def fake_download(region): | |
| download_args.append(region) | |
| return "/photon/data/temp/europe.jsonl.zst" | |
| monkeypatch.setattr(config, "REGION", "andorra,luxemburg") | |
| monkeypatch.setattr(importer, "download_jsonl", fake_download) | |
| monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n']) | |
| monkeypatch.setattr(importer, "_start_photon_import", lambda input_source, country_codes=None: process) | |
| monkeypatch.setattr(importer, "clear_temp_dir", lambda: None) | |
| importer.run_jsonl_import() | |
| assert download_args == ["europe"] | |
| def test_run_jsonl_import_uses_parent_region_and_country_codes_for_multi_region(monkeypatch): | |
| process = RecordingProcess() | |
| download_args = [] | |
| import_args = [] | |
| def fake_download(region): | |
| download_args.append(region) | |
| return "/photon/data/temp/europe.jsonl.zst" | |
| def fake_start_import(input_source, country_codes=None): | |
| import_args.append({"input_source": input_source, "country_codes": country_codes}) | |
| return process | |
| monkeypatch.setattr(config, "REGION", "andorra,luxemburg") | |
| monkeypatch.setattr(importer, "download_jsonl", fake_download) | |
| monkeypatch.setattr(importer, "stream_decompress", lambda path: [b'{"type":"Place"}\n']) | |
| monkeypatch.setattr(importer, "_start_photon_import", fake_start_import) | |
| monkeypatch.setattr(importer, "clear_temp_dir", lambda: None) | |
| importer.run_jsonl_import() | |
| assert download_args == ["europe"] | |
| assert set(import_args[0]["country_codes"]) == {"AD", "LU"} |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_process_manager.py (1)
31-33:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPin
IMPORT_MODEin shared test setup to keep these tests hermetic.Several tests here expect DB update/scheduling paths, but
IMPORT_MODEis not normalised in the common fixture. If the runner environment sets JSONL mode, those tests can short-circuit and fail for the wrong reason.Set
config.IMPORT_MODE = "db"in an autouse/common fixture to remove env-coupling.
🧹 Nitpick comments (2)
src/update.py (2)
77-97: 💤 Low valueConsider using
shlex.quote()to protect against paths with special characters.While
index_fileandconfig.TEMP_DIRcome from internal configuration rather than user input, paths containing spaces or shell metacharacters could cause unexpected behaviour or command parsing issues.🔧 Suggested improvement
+import shlex + def extract_index(index_file: str): logging.info("Extracting Index") logging.debug(f"Index file: {index_file}") if not os.path.exists(config.TEMP_DIR): os.makedirs(config.TEMP_DIR, exist_ok=True) - install_command = f"lbzip2 -d -c {index_file} | tar x -o -C {config.TEMP_DIR}" + install_command = f"lbzip2 -d -c {shlex.quote(index_file)} | tar x -o -C {shlex.quote(config.TEMP_DIR)}" logging.debug(f"Extraction command: {install_command}")
146-168: 💤 Low valueTemp directory not cleaned on pipeline failure.
If checksum verification or extraction fails,
clear_temp_dir()is not called because the exception propagates before reaching line 166. This leaves potentially large files in the temp directory. Consider using atry/finallyblock or explicit cleanup in error paths if disk space is a concern.🔧 Suggested improvement
def run_update(strategy: str): logging.info(f"Starting {strategy.lower()} update pipeline...") prepare_temp_dir() - download_url = get_download_url() - _ensure_disk_space(download_url, parallel=strategy == "PARALLEL") - - logging.info("Downloading index") - index_file = download_index() - - extract_index(index_file) - - if not config.SKIP_MD5_CHECK: - md5_file = download_md5() - logging.info("Verifying checksum...") - verify_checksum(md5_file, index_file) - - logging.info("Activating new index") - index.activate(os.path.join(config.TEMP_DIR, "photon_data")) - clear_temp_dir() + try: + download_url = get_download_url() + _ensure_disk_space(download_url, parallel=strategy == "PARALLEL") + + logging.info("Downloading index") + index_file = download_index() + + extract_index(index_file) + + if not config.SKIP_MD5_CHECK: + md5_file = download_md5() + logging.info("Verifying checksum...") + verify_checksum(md5_file, index_file) + + logging.info("Activating new index") + index.activate(os.path.join(config.TEMP_DIR, "photon_data")) + finally: + clear_temp_dir() logging.info("Update pipeline completed successfully.")
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a6421528-bcf7-4a3b-b052-53bebd7145ed
📒 Files selected for processing (21)
pyproject.tomlsrc/__init__.pysrc/check_remote.pysrc/downloader.pysrc/entrypoint.pysrc/filesystem.pysrc/importer.pysrc/index.pysrc/process_manager.pysrc/update.pysrc/updater.pysrc/utils/sanitize.pytests/jsonl/test_jsonl_downloader.pytests/test_check_remote.pytests/test_downloader.pytests/test_entrypoint.pytests/test_importer.pytests/test_index.pytests/test_process_manager.pytests/test_update.pytests/utils/test_validate_config.py
💤 Files with no reviewable changes (5)
- tests/jsonl/test_jsonl_downloader.py
- src/init.py
- src/updater.py
- src/filesystem.py
- tests/test_check_remote.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/utils/test_validate_config.py
- tests/test_importer.py
TODO: Update README.md