Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b9d88ec4c9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| status: "queued", | ||
| }, | ||
| MAX_CONCURRENT, |
There was a problem hiding this comment.
Count queued exports when enforcing concurrency limit
This insert uses status: "queued" even though insertExportIfUnderLimit only counts rows where status = 'processing' (src/lib/db.ts), so multiple requests arriving together can all pass the limit check and then all transition to processing after spawnExport. That breaks the advertised MAX_CONCURRENT cap and can start far more than 3 heavy exports at once instead of returning 429s.
Useful? React with 👍 / 👎.
| </a> | ||
| )} | ||
| <a | ||
| href={`/api/logs/${exp.id}`} |
There was a problem hiding this comment.
Add logs API route or remove dead dashboard link
The dashboard renders a Logs action for every export, but this URL points to /api/logs/{id} and there is no corresponding handler under src/app/api/logs, so operators always get a 404 when they try to inspect run output. This creates a broken primary action in the main monitoring view.
Useful? React with 👍 / 👎.
| url.searchParams.get("offset") ?? "0", | ||
| 10, | ||
| ); | ||
| const limit = Math.min(Number.isNaN(rawLimit) ? 20 : rawLimit, 100); |
There was a problem hiding this comment.
Clamp exports page size to a non-negative range
The limit parser only caps the upper bound (Math.min(..., 100)) and never enforces a lower bound, so requests like ?limit=-1 are accepted and passed to SQLite. In SQLite, negative LIMIT values disable pagination behavior, which can return the entire table and make this endpoint much more expensive than intended.
Useful? React with 👍 / 👎.
| timeBatches = 1; | ||
| break; | ||
| case "day": | ||
| timeBatches = Math.ceil(rangeSec / 86_400); |
There was a problem hiding this comment.
🟡 estimateFiles ignores config.batchDays for day-mode batching
In estimateFiles, the "minute" case correctly uses config.batchMinutes * 60 to compute time batches, but the "day" case hardcodes 86_400 (one day) instead of using config.batchDays * 86_400. This means if batchDays is ever set to a value other than 1, the file estimate will be wrong (overestimated by a factor of batchDays). Currently this is masked because autoSplit always returns batchDays: 1, but it's inconsistent with the minute-mode pattern and will silently break if batchDays is ever changed.
| timeBatches = Math.ceil(rangeSec / 86_400); | |
| timeBatches = Math.ceil(rangeSec / (config.batchDays * 86_400)); |
Was this helpful? React with 👍 or 👎 to provide feedback.
| `BATCH_OUTPUT_MODE="split"`, | ||
| `FEED_GROUP_SIZE=${config.split.feedGroupSize}`, | ||
| `OUTPUT_DEFAULT="export.csv"`, | ||
| `GENERATE_INDEX_HTML=1`, |
There was a problem hiding this comment.
🔴 GENERATE_INDEX_HTML=1 contradicts documented workaround, causing exports to fail
The buildEnvConfig hardcodes GENERATE_INDEX_HTML=1 (line 71), but docs/S3_INDEX_HTML_ISSUE.md:43-48 explicitly states the current workaround is GENERATE_INDEX_HTML=0 because the RawBLOB upload "consistently fails with Access Denied". The docs also state (lines 81-84) that GENERATE_INDEX_HTML=1 should only be re-enabled after the IAM issue is resolved.
With GENERATE_INDEX_HTML=1, the bash script's write_index_html function is invoked after successful CSV export. The first RawBLOB upload attempt fails (caught by if ! at scripts/run_lazer_export_s3.sh:490), but the retry at scripts/run_lazer_export_s3.sh:504 has no error handling and runs under set -euo pipefail (line 2), causing the script to exit with non-zero status. The Node.js child.on("exit") handler at src/lib/export-runner.ts:256 then marks the export as "failed" — even though all CSV data was successfully exported to S3.
| `GENERATE_INDEX_HTML=1`, | |
| `GENERATE_INDEX_HTML=0`, |
Was this helpful? React with 👍 or 👎 to provide feedback.
Internal tool to replace Retool for historical Pyth Lazer price data exports. Architecture: Next.js on Tailscale server, shelling out to existing CH→S3 export scripts, with SQLite for request tracking and a self-serve dashboard. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove PLAN.md from the repo - Fix min_channel type mismatch (string, not number) - Consolidate channel constants into shared channels.ts - Single Feed type exported from validate.ts - Atomic concurrent export limit via SQLite transaction - Sanitize .env values for bash safety - Zod validation on external feed API response - Minimal env passed to child process - Redact credentials from error messages - Fix O(n*m) feed lookups with Map - Replace magic number 13 with VALID_COLUMNS.length - Remove dead code (initDb, markStuckAsFailed exports) - Add SQLite indexes on status and created_at - NaN-safe pagination, 400 for malformed JSON Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix false "no data found" warning for non-batched exports (add file count output to BATCH_MODE=none path in bash script + update regex) - Deduplicate columns before checking "all" to prevent bypass - Count both 'queued' and 'processing' in concurrent limit check - Clamp pagination limit to minimum 1 (prevent negative LIMIT) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /api/logs/[id] route existed on disk but was never committed because .gitignore had 'logs/' which matched any logs directory. Changed to '/logs/' to only ignore the root-level export logs dir. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
If spawnExport fails (disk full, log dir unwritable, etc.), the catch block now marks the already-inserted DB row as 'failed' so it doesn't permanently consume a concurrency slot. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
ClickHouse IAM role can write CSVWithNames but gets Access Denied on RawBLOB uploads. Disabled GENERATE_INDEX_HTML for now. Dashboard links to S3 prefix URL instead of manifest. Added docs/S3_INDEX_HTML_ISSUE.md explaining the problem and 4 fix options. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-enabled GENERATE_INDEX_HTML=1 with export metadata (name, channel, feed labels). Even if the upload fails with 403, the script continues and the CSVs are still exported. When it works, the index.html will be downloadable from S3. Reverted child process to full env — the minimal env may have been stripping variables needed by the ClickHouse CLI or Python runtime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Summary
Rationale
How has this been tested?