All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Integrity scan hung at "5000 active" forever when Celery workers died mid-scan. The producer loop in
_run_file_changes_checkhad no per-task timeout, so anycalculate_file_hash_taskwhose worker disappeared remainedPENDINGin Redis indefinitely.safe_task_ready()was deliberately built to returnFalseon Redis errors, which kept stuck tasks inactive_tasksforever; combined withMAX_CONCURRENT_SMALL=5000, that pinned the producer at the cap with no submissions or completions. Each entry inactive_tasksnow carries asubmitted_atmonotonic timestamp; tasks older thanINTEGRITY_TASK_TIMEOUT_SECS(default1800, env-overridable) are revoked, logged, and dropped from the active set. - Worker post-fork engine disposal could fail silently (v2.6.41 regression).
_setup_worker_processnow logs entry/exit and wrapsdb.engine.dispose()in a try/except, so a fork-time failure no longer leaves a child unable to log task activity. After the next deploy, the worker container should show_setup_worker_process: starting in worker pid=...and_setup_worker_process: complete in worker pid=...for each fork. - Misleading "queued" heartbeat label. The integrity-scan heartbeat used to print
files_queued, a cumulative counter that was only ever incremented, so an idle producer at the concurrency cap looked like a perpetually growing queue. Replaced with explicitremaining(unsubmitted files) andabandonedcounts, e.g.:Progress: 225881/1167919 processed, 5000 active, 937038 remaining, 0 abandoned. - Integrity-scan UI froze at "75,000 of 1,167,919" while backend kept advancing. Two related bugs: (1) the heartbeat block in
_run_file_changes_checkonly wrotelast_heartbeatto theFileChangesStaterow, leavingphase_current/files_processed/progress_messageunchanged for multi-file Phase 2; (2) the periodic-update block usedtotal_files_processed % update_interval == 0, which only matched when an outer-loop iteration capturedtotal_files_processedexactly on a multiple of 100. Once the producer hit steady state at 5000 active tasks, each iteration completed thousands of tasks at once and the modulo check almost never aligned, so the row stayed at the value of the last lucky alignment. Replaced the modulo with a delta check (total_files_processed - last_progress_update >= update_interval) and made the heartbeat (now 10 s) writefiles_processed,phase_current,phase_total, andprogress_messageunconditionally. - Add Redis-backed real-time progress for the integrity scan. v2.5.67 introduced this for the regular scan (
/api/scan-statusreadsget_scan_progress_redis()when active) but the integrity-scan path was never updated. Addedget_file_changes_progress_redis/update_file_changes_progress_redis/clear_file_changes_progress_redisinpixelprobe/progress_utils.py(separatefile_changes_progress:<check_id>key namespace), wired the producer loop to write on every heartbeat and periodic-update tick, and made/api/file-changes-statusprefer Redis values when the scan is active. The Redis key is cleared when the scan transitions tocompleteorcancelled.
- Bump
python-dotenvfrom 1.0.0 to 1.2.2 to resolve CVE-2026-28684 (arbitrary file overwrite via symlink follow). MEDIUM severity. The pinned 1.0.0 had no fix available; 1.2.2 is the first release with the patch.
- Single-file rescan UI flips to "done" then resumes minutes later. The Flask route created a
ScanStaterow keyed on its generatedscan_id, then the Celery worker'sscan_service.scan_single_filecreated a secondScanStatewith a differentscan_idand tracked progress on that one. The UI's progress monitor lost track between the two rows and reported the scan complete before the worker had even started hashing the file.scan_single_filenow accepts an optionalscan_idand reuses the existing row when one matches; the Celeryscan_media_taskpassesscan_idthrough forscan_type='single'. - Silent first-attempt failures on Celery worker scans caused by post-fork PostgreSQL connection sharing. Symptom in logs:
psycopg2.DatabaseError: error with status PGRES_TUPLES_OK and no message from the libpqon one worker, surfacing as a bareNotImplementedErrorfromsqlalchemy/engine/result.py:_indexes_for_keysin a sibling worker. Fixed by disposing the SQLAlchemy engine in theworker_process_initCelery signal so each forked child builds its own connection pool. The existing log-handler setup in that signal has been merged into a single_setup_worker_processhandler. - Scan progress flickers to "failed" during transient Celery retries.
scan_media_task's catch-all exception handler used to setphase='failed'andis_active=Falseon every error, including ones that were about to be retried. The handler now keeps the row active (phase='initializing'with a "Retrying after error" progress message) when more retries remain, and only marks the scan failed once the retry budget is exhausted or the error is the known-fatalPGRES_TUPLES_OKconnection-corruption case.
- Resolve open CodeQL and Dependabot findings. No external behavior changes; exception details that previously leaked into HTTP response bodies are now logged server-side only.
- Stack trace exposure (49 sites): Exception detail (
str(e),traceback,details) no longer appears in API error responses. Thehandle_errorsdecorator and every route-levelexceptacross the API blueprints now returns a generic error message; the exception is logged server-side withexc_info=Truefor operators. - Path injection (7 sites):
validate_file_pathandvalidate_directory_pathresolve symlinks viaos.path.realpathand validate withos.path.commonpathagainst the configured allowlist, defeating symlink-based escapes. Behavior change:validate_directory_pathnow enforces the configured scan-path allowlist by default;POST /api/scan,POST /api/scan-files-parallel, andPOST /api/parallel/scanwill reject directories outsideSCAN_PATHSor the activeScanConfigurationentries (previously only../~tokens were rejected).POST /api/configurationsis exempt because it is defining a new allowlist entry. The unusedvalidate_path_existsdecorator was removed. - Clear-text logging of sensitive data (3 sites):
tools/migrate_to_postgres.pyno longer holds the DB password in thepg_configdict while logging connection details; the password is stored in a dedicated parameter. Trusted-host config logging insecurity.pydemoted frominfotodebug. - GitHub Actions workflow permissions:
.github/workflows/test.ymldeclarespermissions: contents: readat the workflow level. - Reflective XSS (17 alerts): Reviewed; every flagged site returns JSON via
jsonify()with no HTML sink. Dismissed as false positives; no code change required.
- Stack trace exposure (49 sites): Exception detail (
- Dependency bumps covering 6 CVEs:
Pillow12.1.1 -> 12.2.0 (FITS GZIP decompression bomb).requests2.32.5 -> 2.33.0 (extract_zipped_pathstempfile reuse).Flask-CORS5.0.1 -> 6.0.0 (path-matching CVEs; verified wildcard config is unaffected by the v6 specificity and case-sensitivity changes).pytest8.3.5 -> 9.0.3 (tmpdir handling CVE).blackremoved fromrequirements-test.txt; it was unused dev tooling (no CI gate, nopyproject.toml, no pre-commit hook).
- Scheduled scans silently dropped when another scan is running: When a scheduled scan's cron fire landed while a prior scan was still in progress,
MediaSchedulerloggedScheduled scan {id} skipped - another scan is already runningand returned. APScheduler consumed the cron fire, so the next run advanced to the following cron interval (a weekly Sunday cleanup could go missing for an entire week). Added a queue-on-conflict retry that schedules a one-shot date-trigger retrySCHEDULE_RETRY_DELAY_MINUTESlater (default 10), up toSCHEDULE_RETRY_MAX_COUNTtimes (default 6), and clears the retry state as soon as a scan actually starts. Applies to_run_scheduled_scan,_run_periodic_scan, and the env-var_run_cleanupHTTP 409 path. Retry state is kept in process memory; if the worker restarts while a retry is pending, the retry is lost and the schedule will fire again on its next regular cron. - Database log handler stuck after a flush failure:
DatabaseLogHandler._flush_batchattempteddb.session.rollback()inside a silent try/except; if the rollback itself failed (stale connection, broken pipe), the scoped session stayed inInvalidRequestError: Please rollback() fully before proceedingstate for the life of the writer thread, producing a flood of[log_handler] Failed to flush N log entries to DBlines. Addeddb.session.remove()after the rollback so the next flush starts with a fresh scoped session, and rate-limited the stderr report to once per 60s so recurring failures still surface instead of being logged exactly once at startup. Also movedimport systo the top of the file.
- Fix newly discovered files marked completed without being scanned: Parallel chunk scanning (v2.6.32+) passed the parent
PixelProbeinstance to all chunk worker threads. EachPixelProbeuses aStaticPoolwith a single DB connection -- when 3+ chunk threads shared it concurrently,_save_to_cache()writes were silently lost due to transaction interference. Files ended up withscan_status='completed'butscan_date=NULLandscan_tool=NULL. Fixed by creating a per-threadPixelProbeinstance in each chunk worker, giving each its own isolated DB connection. The v2.6.36 raw SQLUPDATE SET scan_status = 'completed'fix masked the issue by marking unscanned files as done.
- Remove hardcoded 30-chunk limit on scan progress grid: The scan-status API limited chunk results to 30, hiding all other chunks from the UI worker grid. Removed the
.limit(30)so the grid shows all chunks for the current scan.
- Fix scans completing with ~65% of files still pending:
checker.scan_file()committedscan_status='completed'via PixelProbe's separate StaticPool session (connection B), but batch queries used Flask'sdb.session(connection A) which never saw the change. Added raw SQLUPDATE scan_results SET scan_status = 'completed'via Flask'sdb.sessionafter each successful scan, ensuring the next batch query on the same connection sees the status change. Previous fixes (offset(0)in v2.6.34,expire_all()in v2.6.35) failed because they couldn't bridge the cross-connection visibility gap.
- Fix scan leaving ~65% of files stuck in pending:
_scan_chunk_files()batch queries used Flask'sdb.sessionbutchecker.scan_file()committedscan_status='completed'via a separate PixelProbe session. SQLAlchemy's identity map indb.sessionretained stale ScanResult objects from previous batches, causing theoffset(0)batch query to return the same files repeatedly instead of the next pending batch. Addeddb.session.expire_all()before each batch query to force fresh reads from PostgreSQL.
- Fix scan skipping ~25% of pending files per chunk: Batch pagination within
_scan_chunk_files()usedoffset(batch_offset)on queries filtered byscan_status == 'pending'. As each batch was scanned, files changed from 'pending' to 'completed', shrinking the result set. The incrementing offset then skipped files that shifted to lower positions. Fixed by usingoffset(0)for all pending-file queries since scanned files drop out of the result set automatically.
- Fix scan startup crash from dead
offsetvariable:_create_scanning_chunks()crashed every scan withUnboundLocalError: cannot access local variable 'offset'. Leftover from the offset-to-keyset pagination migration -- the function useslast_pathto advance through results, butoffset += chunk_sizewas never removed. - Fix IndexError on scan_state and scan_chunks ORM queries:
db.create_all()creates new tables but does not add columns to existing ones. Columns added to models (num_workers,files_added,files_updatedon scan_state;files_processed,is_complete,celery_task_id,files_addedon scan_chunks) were missing from the production database, causingIndexError: tuple index out of rangewhen SQLAlchemy tried to load rows with fewer columns than the mapper expected. Added startup migration to sync missing columns.
- Per-worker scan progress grid: Collapsible grid below the scan progress bar showing each parallel chunk worker's status, directory path, files scanned/total, and mini progress bar. Collapsed by default, expandable via toggle. Shows active, completed, and errored chunks with real-time updates. Responsive mobile layout hides progress bars and truncates paths.
- Video freeze detection: Detect videos with frozen frames (stuck picture while audio continues) using FFmpeg's
freezedetectfilter. Freeze events are reported as warnings (not corruption) with details including event count, total frozen time, percentage of video frozen, and per-event timestamps. Uses-60dBnoise tolerance and5sminimum freeze duration with black frame filtering to reduce false positives. Configurable viaFREEZE_DETECTION_ENABLEDenvironment variable (default:true). - JPEG pixel corruption detection: Detect visually corrupted JPEG files that pass PIL verification and ImageMagick validation but contain visible garbage (rainbow bands, solid color fill) in decoded pixel data. Uses row-averaged RGB sampling with two signals: sustained chaos (8+ consecutive rows with inter-row color difference > 100, catching rainbow garbage) and bottom-anchored solid fill (30+ consecutive identical rows extending to the image bottom, catching decoder fill). Designed to avoid false positives on high-contrast images like YouTube thumbnails.
- Fix JPEG pixel analysis guards: Added file size guard (skip > 10MB), image dimensions guard (skip > 30MP), 30s timeout, and downscale to ~200px wide before analysis (90KB vs 36MB full-resolution tobytes).
- Fix parallel chunk session error:
_parallel_scan_chunksworker threads sharing Flask's scopeddb.sessioncaused "concurrent operations are not permitted" (isce) errors. Every chunk immediately errored, making the scan complete with 0 files processed. Fixed by callingdb.session.remove()at thread start to force a fresh session per thread. - Fix stuck scan false-positive during long video processing: The
db.session.expire(scan_state)fix for the row-lock convoy also stoppedlast_updatefrom being written. The UI progress worker that should maintainlast_updateindependently fails to launch when Redis is congested from previous crash retries. Added raw SQLlast_updaterefresh at chunk completion to maintain scan self-sufficiency. - Fix scan deadlock from concurrent progress updates: 20 ThreadPoolExecutor worker threads all executed
UPDATE scan_state SET files_processed = files_processed + 1on the same row simultaneously, creating a PostgreSQL row-level lock convoy that permanently blocked all scanning. Moved progress updates out of the per-future loop to a single batch-level update from the main thread after all futures complete. - Fix scan worker death on 500K+ pending files:
_create_scanning_chunks()loaded all file paths into memory at once via.all(), consuming ~200MB+ for 600K files. Replaced withyield_per()streaming that holds only one chunk (~1000 paths) in memory at a time. Addedcancel_futures=Trueto ThreadPoolExecutor shutdown to prevent indefinite hangs. Added composite index(scan_status, file_path)for optimal query performance on pending file lookups. - Fix "0 of 0 files" progress display during scanning phase: The scan status API returned
estimated_total=0during the gap between phase transition and chunk counting. Added explicit Redis update after DB commit and a fallback tophase_totalin the API whenestimated_totalis 0. - Filter black frame false positives from freeze detection: Freeze events that overlap with black sections (scene transitions, studio logos, end credits) are now filtered out. Uses FFmpeg
blackdetectfilter chained withfreezedetectin a single decode pass. - Fix scan stuck as active after completion: Two issues -- (1)
_mark_scan_completed()used raw SQL UPDATE but the ORM identity map still held staleis_active=True; when_create_scan_report()committed, it overwrote the completed state. Addeddb.session.expire_all()after the raw SQL commit. (2) Recovery endpoint and new scans didn't clear stale Redis progress keys, causing the API to return contradictory state (running + completed). Addedclear_scan_progress_redis()to recovery and scan start.
- Fix
app_configsseed data failure on startup: Migration v2.6.0 INSERT failed withNotNullViolationoncreated_atcolumn because SQLAlchemy'screate_all()created the table without PostgreSQLDEFAULTclauses on timestamp columns. Addedserver_default=func.now()to AppConfig model andALTER TABLE ... SET DEFAULTin migration to fix both new and existing deployments. - Fix log download 500 error: The
/api/logs/downloadstreaming generator lost Flask app context, causing anInternal Server Error. Wrapped withstream_with_contextto preserve the request context during streaming. - Add rate limit to
/api/logs/runs: Added missing30 per minuterate limit, consistent with other log endpoints. - Restrict purge endpoint filter scope: Purge now only accepts its documented filters (
scan_id,before,level), ignoring undocumented keys likesearchorstart_timethat were previously passed through. - Fix scan logs missing from Job Run dropdown: The
DatabaseLogHandlerset up in the parent process had a dead_writer_threadin forked Celery worker children (threads don't survive fork). Addedworker_process_initsignal handler incelery_config.pyto create a fresh handler in each worker child process. - Add test coverage for View Logs feature: 47 new tests covering LogEntry/AppConfig models, log context vars, DatabaseLogHandler, all log API endpoints, path filter, and log cleanup.
- View Logs page (#43): New "View Logs" page in the System section with persistent log storage in PostgreSQL, live log viewing with 3-second polling, filtering by level/time range/job run/search, traceback expand/collapse, and log download as
.logfile - Path filter on scan results (#42): New dropdown filter on the scan results page to view results for a single configured scan path, populated from active SCAN_PATHS
- LogEntry model: New database table for persistent log storage with scan_id tagging via Python contextvars
- AppConfig model: New database table for application-level configuration (log retention, excluded loggers)
- DatabaseLogHandler: Background-threaded logging handler that batch-inserts log records to PostgreSQL without blocking app code (10k queue, 100-record batches)
- Log context tagging: Celery scan tasks automatically tag all log entries with scan_id and celery_task_id for filtering
- Log retention cleanup: Configurable retention period (default 30 days) with daily automated cleanup via scheduler
- Log API endpoints: GET /api/logs (filtered/paginated), GET /api/logs/runs, GET /api/logs/download, GET/PUT /api/logs/retention, POST /api/logs/purge
- Path filter API: GET /api/scan-paths returns active configured paths; GET /api/scan-results accepts
pathquery parameter
- SQLAlchemy SAWarning during chunk scanning caused by autoflush when accessing chunk attributes after commit
- Pillow 10.0.0 -> 12.1.1: CVE-2023-50447 (CRITICAL 9.8, arbitrary code execution), CVE-2024-28219 (HIGH)
- Jinja2 (transitive via Flask 3.1.3): CVE-2025-27516 (CRITICAL, sandbox breakout)
- Werkzeug 2.3.7 -> 3.1.6: CVE-2024-34069 (HIGH, RCE via debugger), CVE-2023-46136 (HIGH, DoS)
- gunicorn 21.2.0 -> 23.0.0: CVE-2024-1135, CVE-2024-6827 (HIGH 7.5, HTTP request smuggling)
- requests 2.31.0 -> 2.32.5: CVE-2024-35195 (MEDIUM, TLS bypass), CVE-2024-47081 (MEDIUM, credential leak)
- RequestsDependencyWarning on worker startup: Removed system
python3-chardet7.1.0 from Docker image that conflicted with requests' version check (fires on every gunicorn/celery worker start) - Node.js 20 deprecation warnings in CI: Updated
actions/checkoutv3->v4,actions/setup-pythonv4->v5,codecov/codecov-actionv3->v5
- Flask 2.3.3 -> 3.1.3: Major upgrade required for Werkzeug 3.x compatibility
- Werkzeug safe_join import: Moved from
werkzeug.securitytowerkzeug.utils(Werkzeug 3.x) - Pillow transpose API: Updated
Image.FLIP_LEFT_RIGHTtoImage.Transpose.FLIP_LEFT_RIGHT(Pillow 12 deprecation) - Replaced all
Model.query.get(id)withdb.session.get(Model, id): 43 occurrences across 12 files --query.get()is deprecated in SQLAlchemy 2.x / Flask-SQLAlchemy 3.x - Upgraded Flask ecosystem: Flask-SQLAlchemy 3.1.1, Flask-CORS 5.0.1, Flask-Limiter 3.9.0, Flask-WTF 1.2.2, flask-restx 1.3.2
- Upgraded supporting packages: pillow-heif 0.22.0, SQLAlchemy 2.0.41, psycopg2-binary 2.9.10, celery 5.4.0, redis 5.2.1, reportlab 4.2.5, PyYAML 6.0.2, bcrypt 4.3.0, APScheduler 3.11.0
- Upgraded test dependencies: pytest 8.3.5, pytest-cov 6.1.1, types-requests 2.32.x
- Updated docker-compose.yml image tags to 2.5.69
- Updated CI test matrix from Python 3.9/3.10/3.11 to 3.10/3.11/3.12 (Pillow 12 requires Python >= 3.10; Python 3.9 is EOL)
- Fix stuck scan bug: Scans could get permanently stuck as "active" after a Celery task crash (e.g.,
psycopg2.DatabaseError). The crash recovery handler inscan_service.pyattempted to mark the scan as crashed but failed because the DB session was in a rolled-back state. Addeddb.session.rollback()before recovery writes and re-query the scan state with a fresh session. - Fix stuck scan detection for lost Celery tasks: When Celery task state is
None(task lost/unreachable), theis_scan_running()check previously assumed the scan was still running indefinitely. Now falls through to time-based detection -- if no update for over 1 hour with unknown task state, marks as crashed. - Fix scheduler stuck scan checker: The
_check_stuck_scansscheduler job now also verifies Celery task state. If a Celery task is gone AND no progress update for 5+ minutes, the scan is marked as crashed (previously only relied on 30-minute time threshold). - Fix Phase 3 progress display appearing frozen: The scan-status API endpoint now reads real-time progress from Redis (instead of only PostgreSQL) when a scan is active. Redis is updated by the Celery scan worker on every file, while PostgreSQL lagged behind, causing the UI to show stale values like "97/397" or appear stuck.
- Fix final progress not reflected in DB on scan completion: All scan completion paths now write
files_processedandestimated_totalin the same SQL UPDATE that marks the scan as completed. Previously, the completion UPDATE only setphaseandis_active, leaving stale progress values in PostgreSQL. - Fix UI worker exiting without final sync: The
ui_progress_update_tasknow performs a final Redis-to-PostgreSQL sync of progress values before exiting when it detects the scan is complete or inactive. - Fix ORM staleness in final Redis-to-DB sync: Added
ui_session.refresh(scan_state)before comparing Redis vs DB progress values in_final_sync_redis_to_db(). After_mark_scan_completed()writes via raw SQL, the ORM object could have stale values, causing unnecessary or missed updates. - Fix trailing whitespace in scan_routes.py: Removed trailing whitespace on blank line.
- Move inline import to module level in tasks.py:
get_scan_progress_rediswas imported inline at line 775 despite being available from the existing module-level import. - Files affected:
pixelprobe/services/scan_service.py,pixelprobe/api/scan_routes.py,pixelprobe/scheduler.py,pixelprobe/tasks.py,pixelprobe/progress_utils.py
get_scan_progress_redis()inpixelprobe/progress_utils.py: Read function for fetching real-time scan progress from Redis, used by the scan-status API endpoint._mark_scan_completed()inpixelprobe/services/scan_service.py: Extracted helper that consolidates all scan completion SQL into a single method, replacing 7 duplicated SQL blocks.- Unit tests for new functions: Added
tests/unit/test_progress_utils.pywith 12 tests coveringget_scan_progress_redis(),_final_sync_redis_to_db()sync logic, and_mark_scan_completed()SQL behavior.
- Move all application modules into
pixelprobe/package: Movedmodels.py,auth.py,config.py,media_checker.py,scheduler.py,celery_config.py,version.pyfrom root into thepixelprobe/package. Root now only contains entry points (app.py,celery_worker.py) and build/config files. Updated 73+ import statements across 43 files. - Move root
utils.pyintopixelprobe/utils/helpers.py: Consolidated shared utilities (ProgressTracker,create_state_dict,batch_process, etc.) into the package. - Documentation cleanup: Fixed port 5001 references (should be 5000), updated stale version references (v2.4.48/v2.4.93), fixed SQLite references (PostgreSQL-only since v2.2.0), rewrote PROJECT_STRUCTURE.md, fixed container name references.
- Repository cleanup: Removed legacy files (
database_migrations.py,init_db.py,operation_handlers.py,utils.py,migrations/directory, broken shell scripts, obsolete patches, outdated development docs). - Updated docker-compose.yml image tags from
pixelprobe:test-v2.4.93tottlequals0/pixelprobe:2.5.67. - Updated
.env.examplewith correct PostgreSQL defaults and added missing env vars.
- Healthcheck pings blocked by SSRF protection: Since v2.5.64, healthcheck pings for scheduled scans could fail when the healthcheck server resolves to a private IP address. Added
TRUSTED_INTERNAL_HOSTSenvironment variable that lets admins allowlist hostnames and/or CIDR ranges that should bypass SSRF private-IP blocking. Accepts comma-separated values (e.g.,myhost.local,192.168.5.0/24). SSRF protection remains fully active for non-trusted hosts. - Files affected:
pixelprobe/utils/security.py,config.py,version.py,docker-compose.yml,docs/CONFIGURATION.md,tests/test_security_fixes.py
- Share INTERNAL_API_SECRET via Redis across gunicorn workers: The auto-generated secret was unique per gunicorn worker, causing scheduler internal requests to fail with 401 when routed to a different worker. Now uses Redis
SETNXto generate the secret once and share it across all workers and the celery container. - Files affected:
app.py,version.py
- Fix authentication bypass via X-Internal-Request header: Replace trivially spoofable static string check (
X-Internal-Request: scheduler) with cryptographic HMAC validation using a shared secret (X-Internal-Secret). Secret is auto-generated at startup if not set viaINTERNAL_API_SECRETenvironment variable. - Fix SSRF in healthcheck and notification services: Add
validate_safe_url()function that resolves hostnames and blocks requests to private/reserved IP ranges (RFC 1918, link-local, loopback, cloud metadata 169.254.x.x). Applied to healthcheck pings, webhook notifications, and ntfy notifications. - Add redirect-safe HTTP session: New
create_safe_session()creates arequests.Sessionwith a response hook that validates redirectLocationheaders against the same private IP blocklist, preventing SSRF via DNS rebinding or redirect chains. - Fix ntfy config field name mismatch:
_validate_provider_config()now accepts bothserver_url(canonical, matching notification_service.py) andserver(legacy) field names for backward compatibility. - Files affected:
auth.py,config.py,app.py,scheduler.py,pixelprobe/utils/security.py,pixelprobe/services/healthcheck_service.py,pixelprobe/services/notification_service.py,pixelprobe/api/notification_routes.py,version.py
- Deduplicate
convert_to_tz()in models.py: Extract 8 identical inline copies into a single module-level function - Consolidate
rate_limit()/exempt_from_rate_limit(): Update canonicalpixelprobe/utils/rate_limiting.pyto match inline implementation, replace 5 inline copies in route files with imports - Deduplicate Bearer token extraction in auth.py: Extract
_extract_bearer_token()helper, replace 4 inline token-parsing blocks, remove deadtoken = tokenno-ops - Unify file extension lists: Remove divergent extension sets from
pixelprobe/utils/helpers.py, import canonical lists frompixelprobe/constants.py - Rename ambiguous
validate_file_path: Renamevalidators.pyversion tovalidate_file_path_format()to distinguish from security.py's path traversal prevention - Deduplicate
check_celery_available(): Move to newpixelprobe/utils/celery_utils.py, replace copies inscan_routes.pyandscan_routes_parallel.py - Deduplicate
ContextTaskin celery_config.py: Extract_make_context_task()factory, replace 3 identical inner class definitions - Replace deprecated
datetime.utcnow(): Usedatetime.now(timezone.utc)inmodels.py(4 locations) andsecurity.py(2 locations) - Extract
get_configured_scan_paths(): Create shared helper inpixelprobe/utils/helpers.py, replace 6+ inline DB-fallback-to-env patterns - Use
TERMINAL_SCAN_PHASESconstant: Replace 5 hardcoded['idle', 'completed', 'error', 'crashed', 'cancelled']lists with import frompixelprobe/constants.py - Consolidate scheduler scan methods: Extract
_filter_excluded_paths()and_execute_scan_request()helpers inscheduler.py - Remove dead
truncate_scan_output(): Remove no-op function frommedia_checker.py, replace 7 call sites with direct variable reference - Merge
load_exclusions_with_patterns(): Readexclusions.jsononce instead of twice - Remove unnecessary
getattr()calls: Use direct attribute access inScanReport.to_dict() - Break up
app.py(1137 -> 541 lines):- Extract migration functions to
pixelprobe/migrations/startup.py - Extract scheduler lock management to
pixelprobe/scheduler_lock.py - Extract startup cleanup routines to
pixelprobe/startup.py - Simplify v2.2.68 column migration with loop
- Extract migration functions to
- Files affected:
models.py,auth.py,app.py,scheduler.py,celery_config.py,media_checker.py,version.py,pixelprobe/utils/rate_limiting.py,pixelprobe/utils/helpers.py,pixelprobe/utils/validators.py,pixelprobe/utils/celery_utils.py(new),pixelprobe/utils/__init__.py,pixelprobe/constants.py,pixelprobe/migrations/startup.py(new),pixelprobe/scheduler_lock.py(new),pixelprobe/startup.py(new),pixelprobe/api/admin_routes.py,pixelprobe/api/scan_routes.py,pixelprobe/api/scan_routes_parallel.py,pixelprobe/api/maintenance_routes.py,pixelprobe/api/stats_routes.py,pixelprobe/services/stats_service.py
- Deduplicate
convert_to_tz()in models.py: Extracted 8 identical inline copies into a single module-level function - Consolidate
rate_limit()/exempt_from_rate_limit(): Updated canonicalpixelprobe/utils/rate_limiting.pyto match inline implementations and replaced all 4 inline copies in route files - Deduplicate Bearer token extraction in auth.py: Extracted
_extract_bearer_token()helper, replaced 4 inline copies, removedtoken = tokenno-ops - Unify file extension lists: Removed divergent inline sets from
helpers.py, now imports frompixelprobe/constants.py - Break up
app.py: Extracted 600+ lines into focused modules:pixelprobe/migrations/startup.py-- all DB migration functionspixelprobe/scheduler_lock.py-- Redis distributed lock managementpixelprobe/startup.py-- startup cleanup routines- Reduced
app.pyfrom 1137 to 541 lines
- Deduplicate
check_celery_available(): Moved topixelprobe/utils/celery_utils.py, replaced copies inscan_routes.pyandscan_routes_parallel.py - Deduplicate
ContextTaskincelery_config.py: Extracted_make_context_task()factory, replaced 3 identical class definitions - Replace deprecated
datetime.utcnow(): All occurrences inmodels.pyandsecurity.pynow usedatetime.now(timezone.utc) - Extract
get_configured_scan_paths(): Created shared helper inpixelprobe/utils/helpers.py, replaced 6+ inline "read from DB, fallback to env var" patterns - Use
TERMINAL_SCAN_PHASESconstant: Replaced 5 hardcoded['idle', 'completed', 'error', 'crashed', 'cancelled']lists with the constant frompixelprobe/constants.py - Consolidate scheduler scan methods: Extracted
_filter_excluded_paths()and_execute_scan_request()helpers, reducing duplication between_run_periodic_scanand_run_scheduled_scan - Rename ambiguous
validate_file_path: Renamed validators.py version tovalidate_file_path_format()to avoid confusion with the security.py version - Remove dead
truncate_scan_output(): Was a no-op (returned input unchanged), replaced 7 call sites with direct variable usage - Merge
load_exclusions_with_patterns(): Now readsexclusions.jsononce instead of twice - Remove unnecessary
getattr()calls:ScanReport.to_dict()now accessesnum_workers,files_added,files_updateddirectly - Update stale
deep_scancomment: Replaced outdated version-specific comment with generic backward-compat note - Files affected:
models.py,auth.py,app.py,scheduler.py,celery_config.py,media_checker.py,pixelprobe/utils/rate_limiting.py,pixelprobe/utils/helpers.py,pixelprobe/utils/validators.py,pixelprobe/utils/celery_utils.py(new),pixelprobe/utils/security.py,pixelprobe/utils/__init__.py,pixelprobe/api/admin_routes.py,pixelprobe/api/scan_routes.py,pixelprobe/api/scan_routes_parallel.py,pixelprobe/api/maintenance_routes.py,pixelprobe/api/stats_routes.py,pixelprobe/services/stats_service.py,pixelprobe/migrations/startup.py(new),pixelprobe/scheduler_lock.py(new),pixelprobe/startup.py(new)
- Fix PostgreSQL CREATE INDEX race condition on container startup: Replace
fcntl.flockfile lock with PostgreSQL advisory lock (pg_advisory_lock) for migration coordination- File locks only work within a single container's
/tmpfilesystem -- the app and celery-worker containers each have their own/tmp, so they raced against each other - Advisory locks work across ALL connections to the same PostgreSQL database, preventing the
duplicate key value violates unique constraint "pg_class_relname_nsp_index"errors seen on every container restart (~8 errors per restart event) - Winner process acquires the lock and runs migrations; other processes block until complete then skip
- Falls back to uncoordinated execution if advisory lock fails (each DDL statement already has its own idempotency handling via
IF NOT EXISTS/try/except) - Extract
_run_all_migrations()helper for cleaner code structure - Files affected:
app.py
- File locks only work within a single container's
- Fix file changes scan SQLAlchemy detached instance error: Long-running integrity scans (20+ hours for 1M+ files) would crash at ~99.95% with "Instance has been deleted" errors
- Root cause: ORM objects held in memory for the entire scan duration get expired by
db.session.commit()calls - If concurrent jobs (e.g., "Check for stuck scans" every 5 minutes) delete database rows, subsequent attribute access on expired objects crashes
- Solution: Load only needed columns and convert to plain Python dictionaries immediately after query
- Dictionaries are immune to SQLAlchemy session expiration and concurrent database changes
- Files affected:
pixelprobe/services/maintenance_service.py
- Root cause: ORM objects held in memory for the entire scan duration get expired by
- Repository slop cleanup: Remove ~500+ lines of dead code and outdated documentation
- Remove unused tool scripts:
fix_database_schema_v2.py,fix_hevc_warnings.py,fix_nal_unit_false_positives.py(377 lines) - Remove unused utility functions:
handle_db_errors()decorator andupdate_state_progress()helper fromutils.py - Remove deprecated API endpoints:
/reset-stuck-scansand/recover-stuck-scan(use/scan/recoveryinstead) - Remove orphaned Docker files:
docker-compose.simple.yml,docs/docker/Dockerfile.modern*,docs/docker/docker-compose.*.yml - Remove all console.log/error/debug statements from JavaScript files (~78 statements)
- Update documentation: Fix SQLite references to PostgreSQL (now required), remove dead documentation link
- Delete
audit_reports/directory - historical data preserved in CHANGELOG.md - Files affected:
utils.py,pixelprobe/api/scan_routes.py,static/js/app.js,static/js/state.js,static/js/auth.js, various docs
- Remove unused tool scripts:
- Fix UI not showing scheduled scan progress: Add background scan detection
- Root cause: UI only checked for running scans on page load
- If user opens page after a scheduled scan starts, no progress was shown
- Solution: Added 30-second background interval to detect running scans
- Automatically starts progress monitoring when a scan is detected
- Affects all scan types: regular scans, cleanup, and integrity scans
- Files affected:
static/js/app.js
- Fix duplicate scheduler pings (4x pings for scheduled tasks): Scheduler lock was being acquired by all 4 Gunicorn workers
- Root cause: v2.5.11 fix assumed "same hostname = container restart" and force-acquired locks
- Problem: With 4 Gunicorn workers in one container, ALL workers share the same hostname
- Worker 1 acquires lock:
pixelprobe-app:pid1:timestamp - Workers 2-4 see "same hostname" and FORCE ACQUIRE (treating it as stale self-lock)
- Result: All 4 workers run schedulers, sending 4x pings for scheduled tasks
- Worker 1 acquires lock:
- Solution: Check BOTH hostname AND PID when deciding whether to force-acquire
- Same hostname + same PID = self-lock (refresh/re-acquire - OK)
- Same hostname + different PID = sibling worker (only acquire if stale >65s)
- Different hostname = remote container (only acquire if stale >65s)
- Added helper functions for lock parsing and acquisition logic:
parse_scheduler_lock(): Parse lock value into (hostname, pid, timestamp)should_force_acquire_lock(): Decision logic with clear reason codesstart_scheduler_heartbeat(): Consolidated heartbeat thread creation (DRY)
- Added unit tests for all lock scenarios (7 test cases)
- Files affected:
app.py,tests/test_scheduler.py
- Fix NameError crash on startup (regression from v2.5.56): App failed to start after scheduler fix
- Root cause: v2.5.56 removed
is_celery_workerdefinition but left a reference at line 1091 - Error:
NameError: name 'is_celery_worker' is not defined - Solution: Remove
is_celery_workerfrom the file-lock fallback condition - The file-lock fallback (for local dev without Redis) now works for any process
- Files affected:
app.py
- Root cause: v2.5.56 removed
- Remove obsolete migration scripts: Migrations now run automatically on startup
- Deleted
tools/migrate_db.py- v2.2.46/47 migrations now handled by app_startup_migration.py - Deleted
tools/migrate_db_safe.py- lock handling integrated into automatic migrations - Deleted
tools/run_migration.py- legacy wrapper no longer needed - Updated documentation to reflect automatic migration system
- Files affected:
tools/README.md,tools/MIGRATION_GUIDE.md,docs/maintenance/TOOLS_AND_SCRIPTS.md
- Deleted
-
Fix scheduler not executing jobs: Revert to v2.5.19 scheduler behavior
- Root cause: v2.5.33 added
is_celery_workercheck that forced scheduler to only run in celery-worker - Problem: Celery's prefork model is incompatible with APScheduler's BackgroundScheduler threads
- Celery main process loads app and starts scheduler thread
- Celery then forks worker processes
- Threads don't survive fork - scheduler thread only exists in parent process
- Parent process doesn't handle requests, Flask context is broken
- Jobs are scheduled but never execute
- Solution: Remove
is_celery_workercheck, allow any process to acquire scheduler lock- Gunicorn workers work correctly with APScheduler (each imports app independently)
- Redis SETNX ensures only one scheduler runs at a time
- Files affected:
app.py
- Root cause: v2.5.33 added
-
Fix integrity scan reports missing changed files: Changed files list never populated
- Root cause: Local
changed_filesvariable was populated but never assigned toself.changed_files_list - The instance variable was initialized empty and never updated
- Report generation uses
getattr(self, 'changed_files_list', [])which always returned empty list - Solution: Add
self.changed_files_list = changed_filesafter Phase 2a processing - Files affected:
pixelprobe/services/maintenance_service.py
- Root cause: Local
- Rename "File Changes" to "Integrity Scan": Consistent terminology throughout the UI
- The feature was renamed several versions back but some user-facing text still used the old "File Changes" terminology
- Updated scan reports filter dropdown from "File Changes" to "Integrity Scan"
- Updated schedule type dropdowns from "Integrity Check" to "Integrity Scan"
- Updated all notification messages:
- "File Changes Check Progress" -> "Integrity Scan Progress"
- "File changes check started..." -> "Integrity scan started..."
- "File changes check completed!" -> "Integrity scan completed!"
- "File changes check cancelled" -> "Integrity scan cancelled"
- "File changes check cancellation requested" -> "Integrity scan cancellation requested"
- "Checking for file changes..." -> "Checking file integrity..."
- "No file changes detected" -> "No integrity issues detected"
- Updated scan type display mapping from "Integrity Check" to "Integrity Scan"
- Files affected:
templates/index.html- Dropdown option text (3 locations)static/js/app.js- Notification messages and type mapping (8 locations)
- Fix persistent Redis connection crashes in integrity scan: Enhanced Celery task result handling
- Root cause: v2.5.53 wrappers had insufficient retry logic (3 retries, 0.5s delay)
- When Redis connection pool gets corrupted, ALL connections are bad
- Hitting the same broken pool 3 times doesn't help
- Solution: Enhanced safe wrappers with exponential backoff and connection pool reset
safe_task_ready(): Now 5 retries with exponential backoff (1s, 2s, 3s, 4s)safe_task_get(): Now 5 retries with exponential backoff- Both now reset the connection pool on first failure to get fresh connections
- Added
reset_redis_pool()function to force-disconnect and recreate pool
- Additional fixes:
- Added
safe_check_task_state()wrapper for scan_routes.py AsyncResult access - Protected all AsyncResult.state access patterns from Redis crashes
- Added Celery transport options for connection stability:
socket_keepalive: Trueto detect dead connectionshealth_check_interval: 60for periodic connection validationsocket_timeout: 30andsocket_connect_timeout: 30- Applied to both broker and result backend transport options
- Added
- Regression origin: v2.4.60 introduced
redis.from_url()which returns low-level Connection object instead of full Redis client. v2.5.51 fixed app-level Redis, v2.5.53 added Celery wrappers, but wrappers needed enhanced retry logic with pool reset. - Files affected:
pixelprobe/progress_utils.py- Added reset_redis_pool()pixelprobe/services/maintenance_service.py- Enhanced safe_task_ready/safe_task_getpixelprobe/api/scan_routes.py- Added safe_check_task_state wrappercelery_config.py- Added transport options for resilience
- Fix Celery task result Redis connection errors: Integrity scan crashed when checking task results
- Root cause: Celery's
task.ready()andtask.get()use their own internal Redis connection, separate from our fixed Redis client - When Celery's Redis connection gets reset, task result checking fails with
ConnectionResetError - Error message:
'Connection' object has no attribute 'register_connect_callback' - Solution: Added wrapper functions with retry logic for Celery task result operations
safe_task_ready(): Safely check task status with 3 retries and 0.5s delaysafe_task_get(): Safely get task result with 3 retries and 0.5s delay- Both functions catch
redis.ConnectionError,redis.TimeoutError,ConnectionResetError, andAttributeError - On transient failure, retries before giving up
- Files affected:
pixelprobe/services/maintenance_service.py
- Root cause: Celery's
- Fix HEIC false positive corruption detection for iOS 18 files: HEIC files from iOS 18 devices were incorrectly flagged as corrupted
- Root cause: Older libheif versions (Ubuntu 24.04 ships 1.17.x) don't support iOS 18's use of shared auxiliary images
- Error message:
"Too many auxiliary image references (2.0)"from libheif/ImageMagick - This is a valid HEIF structure per spec, not file corruption (see: github.com/strukturag/libheif/issues/1190)
- Solution: Detect libheif limitation errors and treat as warnings instead of corruption
- Added detection for "auxiliary image" and "too many auxiliary" errors in ImageMagick stderr
- Added detection for "cannot identify image file" errors on HEIC files in PIL
- Files with this error now show as warning "HEIC validation skipped: libheif version limitation"
- Files are no longer incorrectly marked as corrupted
- Note: Full iOS 18 HEIC support requires libheif 1.19.8+ (not yet available in Ubuntu repos)
- Files affected:
media_checker.py
- Fix Redis connection crash in maintenance service: Application crashed with
ConnectionResetError- Root cause:
redis.from_url()returns low-level Connection object, not full Redis client - Error occurred when Redis connection was reset during file changes check
- Error message:
'Connection' object has no attribute 'register_connect_callback' - Solution: Created robust Redis connection handling with retry logic
- Added connection pooling with health checks (
health_check_interval=30) - Added automatic retry on connection failures (3 attempts with 1s delay)
- Added
socket_keepalive=Trueto detect stale connections - Added
retry_on_timeout=Truefor transient failures
- Added connection pooling with health checks (
- New utility functions in
progress_utils.py:get_redis_client(): Returns properly configured Redis client with connection poolget_redis_info(): Safely gets Redis server info with retry logicwith_redis_retry(): Decorator for adding retry logic to Redis operations
- Updated
maintenance_service.pyto useget_redis_info()instead ofredis.from_url() - Updated
app.pyversion endpoint to useget_redis_info()instead ofredis.from_url() - Files affected:
pixelprobe/progress_utils.py,pixelprobe/services/maintenance_service.py,app.py
- Root cause:
- Fix scheduler not starting after container restart: Stale Redis lock prevented scheduler from starting
- When container is killed with SIGKILL, Redis scheduler lock is not released
- If container restarts within 60s TTL, it sees its own stale lock and skips starting scheduler
- Added hostname to lock value format (
hostname:pid:timestamp) to detect self-locks - Container restart now force-acquires its own stale lock immediately
- Maintains backward compatibility with old lock format (
pid:timestamp) - Files affected:
app.py
- Fix CSP blocking Chart.js source maps: Added cdn.jsdelivr.net to connect-src directive
- Browser was blocking requests to fetch Chart.js source maps from CDN
- CSP connect-src only allowed 'self', now includes https://cdn.jsdelivr.net
- Files affected:
app.py
- Fix Python 3.9 compatibility: Type annotation
str | Nonenot supported in Python 3.9- GitHub Actions runs tests on Python 3.9, which doesn't support PEP 604 union types
- Changed
-> str | Noneto-> Optional[str]in notification_routes.py - Added
from typing import Optionalimport - Files affected:
pixelprobe/api/notification_routes.py
- Widen scan_id columns to VARCHAR(64): Old queued tasks with 47-char scan_ids were stuck in retry
- v2.5.46 fixed new scans but old tasks in Celery queue still had long scan_ids
- Widened scan_id columns in scan_state, scan_chunks, cleanup_state from 36 to 64 chars
- Added automatic database migration on startup to widen existing columns
- This allows old stuck tasks to complete and unblock the scheduler
- Files affected:
models.py,tools/app_startup_migration.py
- Fix scheduled scan failing with DataError: scan_id exceeded VARCHAR(36) column limit
- Scheduler adds timestamp:
scheduled_{id}_{YYYYMMDD_HHMMSS}(28 chars) - scan_routes.py was adding ANOTHER timestamp, making it 47 chars
- Database column
scan_state.scan_idis VARCHAR(36) - Solution: Removed duplicate timestamp from scan_routes.py
- Scheduler's timestamp is already unique per scheduled run
- Files affected:
pixelprobe/api/scan_routes.py
- Scheduler adds timestamp:
- Fix v2.5.44 syntax error: Fixed
nonlocalstatement that caused app crashnonlocalonly works for enclosing function scopes, not block-level scopes- Changed
scheduler_initializedto use mutable list[False]pattern - Files affected:
app.py
- Fix scheduler not starting after container restart: Scheduler lock has no retry mechanism
- When container restarts quickly (~23 seconds), the old Redis lock hasn't expired yet
- Lock check only happened ONCE at startup - no retry if acquisition failed
- Old lock would expire at 60s, but new celery-worker never rechecked
- Result: Scheduler never started, scheduled scans never ran
- Solution: Added background retry thread that checks for lock every 30 seconds
- Retries up to 10 times (~5 minutes) to acquire expired or stale locks
- Once acquired, starts normal heartbeat thread to maintain the lock
- Files affected:
app.py
-
Fix startup race condition in sync_scan_paths_to_db(): v2.5.41 crashed on startup
- Multiple gunicorn workers starting simultaneously caused duplicate key violations
- Changed from check-then-insert to PostgreSQL
INSERT ... ON CONFLICT DO NOTHING - This is an atomic operation that safely handles concurrent inserts
- Files affected:
app.py
-
Deploy scheduler API hostname fix: Scheduled scans now work from celery-worker
- v2.5.42 fix for using
pixelprobe:5000instead oflocalhost:5000is now deployed - Celery-worker can now reach the main app to trigger scans
- v2.5.42 fix for using
- Fix scheduler API calls from celery-worker: Scheduler couldn't reach main app API
- In Docker, celery-worker's
localhost:5000doesn't reach the main pixelprobe container - Added
_get_api_base_url()method that detects Docker environment - Uses
http://pixelprobe:5000(Docker service name) instead oflocalhost - Supports
API_BASE_URLenv var override for custom configurations - Files affected:
scheduler.py
- In Docker, celery-worker's
- Sync SCAN_PATHS to database: Fixed scheduler not being able to read scan paths in celery-worker
- Main app now syncs SCAN_PATHS from environment to
scan_configurationsdatabase table on startup - Scheduler reads paths from database instead of environment variable
- This allows celery-worker to access paths via shared database without needing env var
- Fallback to env var if database is empty (backwards compatibility)
- Files affected:
app.py,scheduler.py
- Main app now syncs SCAN_PATHS from environment to
- Missing SCAN_PATHS in celery-worker: Fixed docker-compose.yml missing scan-related environment variables for celery-worker container
- When scheduler was moved to celery-worker in v2.5.28, the SCAN_PATHS env var was never added
- Added SCAN_PATHS, EXCLUDED_PATHS, EXCLUDED_EXTENSIONS, PERIODIC_SCAN_SCHEDULE, CLEANUP_SCHEDULE, and TZ to celery-worker
- IMPORTANT: Users must also add SCAN_PATHS to their production celery-worker container (Portainer/Docker Compose)
- Files affected:
docker-compose.yml
- Stale Scheduler Lock Detection: Added automatic takeover of stale scheduler locks
- When celery-worker starts and lock acquisition fails, checks if lock timestamp is >65s old
- If lock is stale (heartbeat stopped), forces acquisition and initializes scheduler
- Prevents scheduler from being stuck when old worker dies without releasing lock
- Logs lock age for debugging: "lock held by: X, age=Ys"
- Files affected:
app.py
- Restore v2.5.19 Scheduled Scan Logic: Fixed all regressions introduced in v2.5.32
- Added empty SCAN_PATHS validation with proper error logging
- Added empty filtered_paths validation after exclusion filtering
- Restored timestamp in scan_id (
scheduled_{id}_{timestamp}) to prevent false "already completed" detection - Updated schedule_id parsing for healthcheck to handle timestamp format
- Root cause: v2.5.32 accidentally reverted fixes from v2.5.19 while trying to fix SQLAlchemy session issues
- Files affected:
scheduler.py
- SCAN_PATHS Fallback Debug Logging: Added logging to diagnose why scheduled scans with empty paths aren't falling back to SCAN_PATHS env var
- Logs the raw
scan_pathsvalue from database - Logs the SCAN_PATHS env var value when fallback is triggered
- Files affected:
scheduler.py
- Logs the raw
- Schedule Debug Logging: Added detailed logging to show all schedules in database during scheduler startup
- Logs total count of schedules in database
- Lists each schedule with name, id, is_active status, and cron expression
- Helps diagnose why specific schedules may not be loading
- Files affected:
scheduler.py
- Schedule Updates Not Reaching Celery Worker: Fixed schedule changes in UI not being applied
- Root cause: v2.5.33/34 moved scheduler to celery worker, but
scheduler.update_schedules()was called from gunicorn (Flask) where scheduler doesn't run - The scheduler silently skipped updates since
scheduler.running = Falsein gunicorn - Fix: Created
reload_schedules_taskCelery task to trigger schedule reload in the worker - Now when schedules are created/updated/deleted via UI, a Celery task notifies the worker to reload
- Files affected:
pixelprobe/tasks.py,pixelprobe/api/admin_routes.py
- Root cause: v2.5.33/34 moved scheduler to celery worker, but
- HOTFIX: v2.5.33 Missing Import: Fixed app crash on startup due to missing
import sys- v2.5.33 added code using
sys.argvbut forgot to import thesysmodule - Files affected:
app.py
- v2.5.33 added code using
- Scheduler Not Running After Container Restart: Fixed scheduled scans not executing after container restart
- Root cause: v2.5.28 removed unconditional Redis lock deletion, which unintentionally allowed gunicorn workers to acquire the scheduler lock before the Celery worker
- Gunicorn's pre-fork model is incompatible with APScheduler's background threads
- Fix: Only allow Celery worker to attempt scheduler lock acquisition
- Added check for
celeryinsys.argv[0]to detect Celery worker context - Non-Celery processes (gunicorn workers) now skip scheduler initialization entirely
- This restores all scheduled tasks: periodic scans, cleanup scans, stuck scan checker, and user-defined schedules
- Files affected:
app.py
- Scheduled Scans Running with Empty Paths: Fixed scheduled scans not scanning any files after container restart
- Root cause: SQLAlchemy session expiry after
db.session.commit()in scheduler - After commits (healthcheck ping update, last_run update), the schedule object expires
- When
schedule.scan_pathswas accessed after commit, lazy-loading failed in celery worker context - Fix: Cache
scan_paths,scan_type, andschedule.nameimmediately after loading, before any commits - This issue manifested when the scheduler ran in the celery worker instead of the Flask app
- Files affected:
scheduler.py
- Root cause: SQLAlchemy session expiry after
- Pending Files Fix Incomplete in v2.5.30: The retry mechanism was only added to 2 of 6 scan completion paths
- Root cause: v2.5.30 added
_retry_pending_files()to chunk-based scan paths but missed direct file scan paths - Fixed scan paths now include:
_sequential_scan- used for small file lists in sequential mode_parallel_scan- used for small file lists in parallel mode_sequential_scan_selected_chunks- used for selected file scans in sequential mode_parallel_scan_selected_chunks- used for selected file scans in parallel mode
- Added enhanced logging with scan method identifiers (e.g., "PARALLEL DIRECT", "SEQUENTIAL SELECTED CHUNKS")
- Warning logged if any files remain pending after retries
- Files affected:
pixelprobe/services/scan_service.py
- Root cause: v2.5.30 added
-
Pending Files Not Processed During Scan: Fixed scans completing with files left in 'pending' status
- Root cause: Scan would iterate through files and mark as "complete" based on iteration count, not actual database saves
- Added retry mechanism at scan completion to re-scan any files still in 'pending' status (up to 2 retries)
- Ensures all files are processed before marking scan as complete
- Added verification warning in logs if any files remain pending after retries
- Affects both sequential and parallel scan completion paths
- Files affected:
pixelprobe/services/scan_service.py
-
Silent Database Save Failures: Added tracking for failed database saves during scanning
- Added
failed_savesandsuccessful_savescounters to PixelProbe class - Logs cumulative failed save count for debugging
- Added
get_save_stats()andreset_save_stats()methods for monitoring - Files affected:
media_checker.py
- Added
- Pending file count now included in scan report statistics query
- Warning logged when files remain in 'pending' status after scan completion
- Scan Chunk Pagination Race Condition: Fixed files being skipped during scan due to pagination bug
- Chunks were using OFFSET/LIMIT on pending files query, but as files were scanned their status changed
- This caused the result set to shrink, making subsequent chunk offsets skip files
- New approach: Capture file path boundaries at chunk creation time (JSON format: FCP)
- Query by stable path range (
file_path >= first AND file_path <= last) instead of offset/limit - Maintains backwards compatibility with legacy FILE_CHUNK format
- Files affected:
pixelprobe/services/scan_service.py
- Updated
docs/screenshots/features/scan-schedules.pngwith working Font Awesome icons
-
Font Awesome Icons Missing: Fixed CSP blocking Font Awesome CSS from cdnjs.cloudflare.com
- Changed Font Awesome CDN from
cdnjs.cloudflare.comtocdn.jsdelivr.net(already in CSP whitelist) - Affects both
templates/index.htmlandtemplates/login.html - Icons now display correctly on all pages (mobile and desktop)
- Changed Font Awesome CDN from
-
Scheduler Race Condition: Fixed duplicate scheduler initialization during container startup
- Removed unconditional Redis lock deletion in
app.pythat caused race condition - When both pixelprobe and celery-worker containers start simultaneously, they were deleting each other's locks
- The 60-second TTL already handles stale locks from crashed containers
- This fixes premature scan completion and pending files accumulation issues
- Removed unconditional Redis lock deletion in
This release addresses gaps identified in the November 2025 security audit verification.
- BREAKING: Removed API token query parameter support (P0 security fix)
- Tokens in URLs are logged in server access logs and exposed in browser history
- API tokens MUST now be sent via
Authorization: Bearer <token>header only - Affects
auth_required(),check_auth(), andget_authenticated_user()functions
- Added comprehensive security headers middleware (P1):
X-Frame-Options: SAMEORIGIN- Prevents clickjackingX-Content-Type-Options: nosniff- Prevents MIME sniffingX-XSS-Protection: 1; mode=block- Legacy XSS protectionReferrer-Policy: strict-origin-when-cross-originContent-Security-Policy- CSP with 'unsafe-inline' for inline handlersStrict-Transport-Security- HSTS for HTTPS connections
- Added session inactivity timeout (30 minutes) (P1)
- Automatically logs out users after 30 minutes of inactivity
- Returns 401 for API requests, redirects for UI requests
- Added composite database indexes for common query patterns (P1):
idx_status_corruptedon(scan_status, is_corrupted)idx_scan_date_corruptedon(scan_date DESC, is_corrupted)idx_exists_statuson(file_exists, scan_status)- Indexes are created via startup migration with
IF NOT EXISTS
- Notification system API routes (P3 feature completion):
GET/POST /api/notifications/providers- List/create notification providersGET/PUT/DELETE /api/notifications/providers/<id>- CRUD for providersPOST /api/notifications/providers/<id>/test- Test notification deliveryGET/POST /api/notifications/rules- List/create notification rulesGET/PUT/DELETE /api/notifications/rules/<id>- CRUD for rules- Supports Pushover, ntfy.sh, and webhook providers
- ARIA accessibility attributes (P3 compliance):
- Navigation landmarks:
role="navigation",role="main",role="banner" - Decorative icons:
aria-hidden="true"on Font Awesome icons - Interactive elements:
aria-labelon buttons with icons only - Progress tracking:
role="progressbar"with aria-value attributes - Live regions:
aria-live="polite"for dynamic content updates
- Navigation landmarks:
- Verified existing PIL/Pillow image validation follows best practices
- Verified existing FFmpeg video validation follows best practices
- Applied SQLAlchemy composite index patterns from Context7 docs
- Applied Flask-Session inactivity timeout patterns from Context7 docs
- Fixed flaky schedule reactivation test that failed due to APScheduler timing edge cases
- Updated pytest configuration to properly scope test discovery to
tests/andscripts/directories - UI screenshot updates with correct viewport sizing and element-specific captures
Scheduled cleanup and file_changes scans were creating multiple reports:
- Gunicorn runs 4 workers, each with its own process-local
current_cleanup_threadvariable - When scheduler triggered 4 simultaneous requests, 3 workers passed the "already running" check
- Result: 3 cleanup operations ran concurrently, creating 3 reports
Cleanup and file_changes scans sent start ping but no completion ping:
- Reports were created without
scan_idfield (neededscheduled_Xformat) send_healthcheck_completion()was never called after report creation- Healthcheck monitoring showed scan started but never finished
Duplicate Prevention:
- Added database-level check using
CleanupState.is_activeandFileChangesState.is_active - Database check provides cross-worker visibility (vs process-local thread check)
Healthcheck Integration:
- Scheduler now passes
schedule_idin API request body - API routes accept and pass
schedule_idto async functions - Report creation now sets
scan_id=scheduled_{id}for schedule association - Calls
MediaScheduler.send_healthcheck_completion()after report creation
pixelprobe/api/maintenance_routes.py: Database-level duplicate checks, schedule_id handlingpixelprobe/services/maintenance_service.py: Schedule-aware report creation with healthcheck pingscheduler.py: Pass schedule_id to cleanup/file_changes API requests
Scheduled scans were being skipped with message "Scan already completed (from previous attempt)":
- All scheduled scans used static scan_id like
scheduled_1orscheduled_periodic - Celery task checked if scan_id was already completed
- Found PREVIOUS day's completed scan and skipped the new scan
- Healthcheck received "success" ping despite no actual scanning
scheduler.pysentsource: scheduled_1to/api/scanscan_routes.pyused this directly asscan_idtasks.pyqueried ScanState forscan_id=scheduled_1and found completed state- Task returned early thinking it was a retry of a completed scan
- Added timestamp to scheduled scan_id:
scheduled_1_20241204_180000 - Each scheduled scan now has a unique identifier
- Celery retry detection only matches same-session scans
- Applies to both
scheduled_{id}andscheduled_periodicsources
- Scheduled scans now execute correctly
- New files are discovered on schedule
- Healthcheck pings reflect actual scan execution
v2.5.23's exit-code-only approach failed due to FFmpeg version differences:
- FFmpeg 8.0 (macOS): Returns exit code 0 for PPS seek errors
- FFmpeg 6.1.1 (Docker/Ubuntu 24.04): Returns non-zero exit code for same errors
Stage 3 (multi-point sampling) now never marks files as corrupted:
- All Stage 3 output is logged as debug info only
- Stage 1 (full decode from beginning, no seeking) remains the authoritative check
- 3 remaining false positives now correctly detected as healthy
- Works consistently across all FFmpeg versions
- No pattern matching or exit code dependencies
-
Removed
-err_detect crccheck+bitstreamfrom Stage 3 multi-point sampling- Aggressive error detection is incompatible with seeking mid-stream
- HEVC decoder reports errors (PPS missing, first slice missing) until finding keyframe
- These are normal seek artifacts, not corruption
-
Simplified Stage 3 logic
- Now just checks FFmpeg exit code (0 = success)
- Removed complex benign pattern matching (was a maintenance burden)
- Stage 4 still uses aggressive detection (starts from beginning, no seek issues)
- All false positives from HEVC seek artifacts eliminated
- Cleaner, more maintainable code
- Actual corruption still detected via non-zero exit code
-
HEVC seek artifacts no longer trigger false corruption
First slice in a frame missing- Normal HEVC seek behaviorPPS id out of range- HEVC initialization after seek
-
EOF/seek-related messages no longer trigger false corruption
Cannot determine format- Video stream shorter than containerNothing was written into output- No frames at seek positionError marking filters/Error while filtering- Consequence of EOFReceived no packets- No data at seek position
-
Audio codec warnings no longer trigger false corruption
- TrueHD, DTS-HD warnings (audio issues, not video corruption)
quant_step_sizeTrueHD specific warning
- 19+ files previously marked as "Corrupted" now correctly show as "Healthy"
- Files affected: Station Eleven (all episodes), Law Abiding Citizen, Avengers Endgame, Mufasa, Lord of the Rings, etc.
media_checker.py- Added benign patterns in_check_multipoint_sampling()
- Files with only DTS/PTS timestamp warnings now marked as HEALTHY (not Warning)
- Removed
warning_detailsassignment in_check_video_corruption()for benign warnings - DTS/PTS warnings are common muxer artifacts in H.264/HEVC files
- Files with NAL unit warnings or reference frame warnings also stay HEALTHY
- Removed
- Root cause:
_check_video_corruption()at line 1343 was adding warnings towarning_details - This caused
has_warnings = Trueeven though warnings are benign - Fix: Log the warnings for debugging but don't add to warning_details
media_checker.py- Removed warning_details assignment for benign DTS/PTS warnings
-
HEVC "PPS changed between slices" no longer triggers false positive corruption
- Added to benign patterns list in multi-point sampling
- Common in Blu-ray remux content where dynamic parameter sets are used
- Files with only PPS warnings now marked as HEALTHY (not corrupted)
-
Added additional HEVC benign patterns
'skipping nal unit'- Filler/padding NAL units (type 63)'last message repeated'- FFmpeg continuation line format
- Improved warning logging format
- Now shows counts: "14 PPS parameter changes, 9 NAL unit skips in middle"
- Files with only benign warnings stay HEALTHY (no warning_details added)
- Root cause: Multi-point sampling in
_check_multipoint_sampling()was flagging legitimate HEVC stream features - PPS (Picture Parameter Set) changes are part of HEVC spec for adaptive encoding
- NAL unit 63 is reserved/unspecified, often used for Blu-ray markers
media_checker.py- Updated benign_patterns list and warning logging
-
/api/versionendpoint now returns infrastructure component versions- Celery version (task queue)
- Redis version (message broker)
- PostgreSQL version (database)
- Example response includes
infrastructure: { celery: "5.3.4", redis: "7.2.3", postgresql: "15.2" }
-
/api/versionendpoint added to OpenAPI documentation- New "System" tag category for system information endpoints
- Full schema documentation for version response
- Removed
tools/update_version.py- Version should only be updated manually inversion.py openapi.yamlversion is now dynamic - Served via/api/openapi.yamlwith version injected fromversion.pyat runtime- Static
openapi.yamluses placeholder version0.0.0- Indicates it's dynamically set
version.pyis the single source of truth for application version- All version references read from
version.pydynamically - Infrastructure versions queried at runtime from actual services
app.py- Enhanced/api/versionendpoint with infrastructure versionsopenapi.yaml- Added/api/versionendpoint documentation, added "System" tag- Deleted:
tools/update_version.py
- v2.5.16 fix was incomplete - existing stale lock was not cleared
- v2.5.16 only changed TTL for NEW locks from 24h to 60s with heartbeat
- But existing stale lock (created with 24h TTL by v2.5.15) still blocked scheduler
- Fix: Now explicitly deletes any stale lock on startup BEFORE attempting to acquire
- Safe because on fresh container startup, any existing lock is from a dead process
- Workers within same container race to acquire after delete, first one wins
- Added
redis_client.delete(lock_key)beforeset(..., nx=True)inapp.py - Added log message: "Cleared stale scheduler lock (if any) on startup"
app.py- Added delete of stale lock before acquiring new lock
- Scheduler not starting after container restart due to stale Redis lock
- Root cause: Redis scheduler lock used 24-hour TTL (86400 seconds)
- When container is replaced during deployment, old container's lock persists
- New container finds existing lock and skips scheduler initialization
- Result: No scheduler running - all scheduled scans fail to execute
- Fix: Changed lock TTL from 24 hours to 60 seconds with heartbeat refresh
- Added background thread that refreshes the lock every 30 seconds
- Stale locks from crashed containers now expire within 60 seconds
- Active schedulers maintain their lock indefinitely through heartbeat
- Modified scheduler lock logic in
app.pylines 721-755 - Lock TTL reduced from
ex=86400toex=60 - Added daemon thread
refresh_scheduler_lock()that runs every 30 seconds - Thread refreshes lock with new timestamp and 60-second TTL
- Worst case recovery time: 60 seconds after container crash
app.py- Changed lock TTL and added heartbeat thread
- No completion ping sent when scheduled scan detects existing completed scan
- Root cause: When Celery task detects that a scan with the same
scan_id(e.g.,scheduled_11) already exists incompletedphase, it returns early to prevent duplicate work - This early return bypassed
_create_scan_report()which callssend_healthcheck_completion() - Result: Start ping sent by scheduler, but no completion ping ever sent
- Fix: When returning early for "already completed" scans, still send completion ping since the scheduler already sent a start ping and healthcheck services expect a completion signal
- Now finds the existing ScanReport for the scan_id and sends the completion ping before returning early
- Root cause: When Celery task detects that a scan with the same
- Modified early-return path in
scan_media_task()atpixelprobe/tasks.py:100-126 - When
scan_id.startswith('scheduled_')and scan is already completed, queries for existingScanReportand callsMediaScheduler.send_healthcheck_completion() - Added
ScanReportto imports in tasks.py
pixelprobe/tasks.py- Added completion ping to early-return path for already-completed scans
- Re-enabling a disabled schedule shows stale next_run time
- Root cause: In multi-worker Gunicorn deployment, scheduler runs in only ONE worker
- When API request to re-enable schedule arrives, it may be handled by a worker without the scheduler
scheduler.update_schedules()returns early ifnot self.scheduler.running- The
next_runis never recalculated for re-enabled schedules - Fix: Calculate
next_rundirectly in the API endpoint using APScheduler'sCronTrigger.get_next_fire_time() - This works without requiring a running APScheduler instance
- Supports both cron expressions (e.g., "*/5 * * * *") and interval expressions (e.g., "interval:hours:6")
- Also recalculates
next_runwhen cron expression is changed on an active schedule
- Added
calculate_next_run()helper function toadmin_routes.py - Uses APScheduler's
CronTriggerclass which can calculate next fire time without a running scheduler - Detects when schedule transitions from
is_active=Falsetois_active=True - Also triggers recalculation when
cron_expressionchanges on an active schedule
pixelprobe/api/admin_routes.py- Addedcalculate_next_run()helper and updatedupdate_schedule()endpointtests/integration/test_admin_endpoints.py- Added tests for schedule reactivation scenarios
-
Scheduled scans fail with duplicate key constraint violation
- Root cause:
scan_idhas unique constraint in ScanState model - When a scheduled scan runs again, it tries to INSERT with same scan_id (e.g., "scheduled_1")
- Previous scan_state record remains in database after completion
- Fix:
create_new_scan()now deletes existing scan_state with same scan_id before creating new - Scheduled scans can now run repeatedly without constraint violations
- Root cause:
-
DetachedInstanceError after scan completion
- Root cause:
db.session.close()in retry loop at line 729 detaches all SQLAlchemy objects final_scan_statewas fetched before retry loop but accessed after potential session closure- Fix: Re-fetch
final_scan_stateafter retry loop completes - Added graceful fallback if scan_state cannot be re-fetched
- Root cause:
-
Celery retry creates duplicate scan state on already-completed scans
- Root cause: When Celery retries a task after DetachedInstanceError, scan may have already completed
- Retry would try to create new scan_state with same scan_id
- Fix: Check if scan_id already has completed phase before starting
- Returns early with success if scan already finished
models.py:create_new_scan()deletes existing scan_state with same scan_id before INSERTscan_service.py: Re-fetch final_scan_state after db.session.close() to avoid DetachedInstanceErrortasks.py: Early-exit check if scan_id already has phase='completed'
models.py- Delete existing scan_state before creating new one for same scan_idpixelprobe/services/scan_service.py- Re-fetch scan_state after retry looppixelprobe/tasks.py- Skip retry if scan already completed
-
Orphaned pending files from failed scan chunks now get processed
- Root cause: SQLAlchemy concurrency error "This session is provisioning a new connection" caused scan chunks to fail silently
- Scan would report success but files in failed chunks stayed in
pendingstatus forever - Fix 1: Remove directory filter from pending files query so ALL pending files are included in every scan
- Fix 2: Add retry logic for failed chunks - failed chunks are retried sequentially after parallel processing completes
- This ensures orphaned files from previous interrupted scans eventually get processed
-
FFmpeg warning details now show actual DTS/PTS counts
- Before: Generic "Benign FFmpeg warnings in middle section" message
- After: Specific counts like "3 DTS, 2 PTS warnings in middle"
- Helps identify actual warning severity without looking at raw logs
-
Failed scan chunks are now retried
- Before: If a chunk failed during parallel processing, it was logged and ignored
- After: Failed chunks are tracked and retried sequentially after the main parallel loop
- Sequential retry avoids the concurrency issues that caused the original failure
- Pending files from failed chunks (e.g., SQLAlchemy concurrency errors) are now recovered
scan_service.pyline 573: Removed directory filter from pending files countscan_service.pylines 1562-1590: Added failed chunk tracking and sequential retrymedia_checker.pylines 1915-1927: Parse stderr for DTS/PTS counts instead of generic message
pixelprobe/services/scan_service.py- Include all pending files in scan, add chunk retry logicmedia_checker.py- Show actual DTS/PTS warning counts
- Duplicate start pings due to scheduler running in multiple containers
- Bug: File-based lock (
fcntl.flock) doesn't work across containers (each has separate/tmpfilesystem) - Both gunicorn and celery-worker containers would acquire their own lock and initialize the scheduler
- Fix: Replaced file lock with Redis-based distributed lock using
SETNX - Uses
redis.set(lock_key, lock_value, nx=True, ex=86400)for atomic acquisition - 24-hour expiry ensures auto-recovery if container crashes
- Falls back to file lock if Redis unavailable (for local development)
- Bug: File-based lock (
- Lock key:
pixelprobe:scheduler:lock - Lock value:
{pid}:{timestamp}for debugging - Only the first container to acquire the Redis lock will initialize the scheduler
- Other containers log which process holds the lock
app.py- Replace file-based lock with Redis distributed lock
- Scheduled scan ID not preserved through scan pipeline
- Bug: Even though
/api/scannow passesscheduled_XXas scan_id,ScanService.scan_directories()was creating a new UUID - The scan_id was not passed through: Celery task -> scan_directories() -> ScanState.create_new_scan()
- Fix: Added
scan_idparameter through the entire chain ScanState.create_new_scan(scan_id=None)now accepts optional scan_idscan_directories(scan_id=None)now passes it to create_new_scan- Celery task now passes scan_id to scan_directories
- Bug: Even though
models.py-create_new_scan()accepts optional scan_id parameterpixelprobe/services/scan_service.py-scan_directories()accepts and passes scan_idpixelprobe/tasks.py- Pass scan_id to scan_directories()
- Scheduled scans not identified for healthcheck completion pings
- Bug:
/api/scanendpoint ignored thesourcefield from scheduler - The scheduler sent
source: 'scheduled_XX'but endpoint generated a new UUID send_healthcheck_completion()checks forscan_id.startswith('scheduled_')to identify scheduled scans- Since scan_id was always a UUID, completion pings were never sent
- Fix: Use
sourceasscan_idwhen it starts withscheduled_
- Bug:
pixelprobe/api/scan_routes.py- Use scheduler source as scan_id for scheduled scans
- Healthcheck completion ping failed with type error
- Bug: Passing
report.report_id(UUID string) instead ofreport.id(integer primary key) - Error:
invalid input syntax for type integer: "UUID-string" - Fix: Changed to pass
report.idtosend_healthcheck_completion() - Completion pings now work correctly for all scan types including empty scans (0 files)
- Bug: Passing
pixelprobe/services/scan_service.py- Fixed parameter type (report.id instead of report.report_id)
-
Healthcheck Completion Pings Now Working
- CRITICAL BUG:
send_healthcheck_completion()was defined but never called - Start pings worked, but success/failure pings on scan completion were never sent
- Added call to
MediaScheduler.send_healthcheck_completion()inscan_service._create_scan_report() - Now properly sends success ping when scan completes successfully
- Now properly sends failure ping when scan fails/errors
- Includes scan report data in success pings when configured
- CRITICAL BUG:
-
Improved Healthcheck URL Logging
- All ping methods now log the full URL being pinged
- Logs include HTTP status code on success
- Error messages now include the URL that failed
- Helps debug connectivity issues with self-hosted instances
-
Self-Hosted Healthchecks.io Support
- Removed confusing warning message for non-standard URL formats
- Supports any URL format (public hc-ping.com or self-hosted instances)
- Self-hosted URLs can use custom paths, not just
/ping/UUID
-
Schedule Action Buttons - Now All Icons
- Disable button: Changed from text "Disable" to pause icon (
fa-pause) - Enable button: Changed from text "Enable" to play icon (
fa-play) - All four buttons now use icons only: edit, healthcheck, pause/play, trash
- Added title tooltips for accessibility
- Disable button: Changed from text "Disable" to pause icon (
-
Uniform Button Sizing on Mobile
- All schedule action buttons now have fixed 2.25rem width/height
- Buttons use flexbox centering for consistent icon alignment
- Removed variable sizing that caused layout issues
pixelprobe/services/scan_service.py- Added healthcheck completion ping callscheduler.py- Updated method signature for easier callingpixelprobe/services/healthcheck_service.py- Improved logging, removed restrictive URL warningstatic/js/app.js- Changed Enable/Disable to icons