Skip to content

Feat/pyth data puller#3570

Open
aditya520 wants to merge 12 commits intomainfrom
feat/pyth-data-puller
Open

Feat/pyth data puller#3570
aditya520 wants to merge 12 commits intomainfrom
feat/pyth-data-puller

Conversation

@aditya520
Copy link
Copy Markdown
Member

@aditya520 aditya520 commented Mar 22, 2026

Summary

Rationale

How has this been tested?

  • Current tests cover my changes
  • Added new tests
  • Manually tested the code

Open with Devin

@aditya520 aditya520 requested a review from a team as a code owner March 22, 2026 18:25
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
api-reference Ready Ready Preview, Comment Mar 27, 2026 8:51pm
component-library Ready Ready Preview, Comment Mar 27, 2026 8:51pm
developer-hub Error Error Mar 27, 2026 8:51pm
entropy-explorer Error Error Mar 27, 2026 8:51pm
insights Error Error Mar 27, 2026 8:51pm
proposals Error Error Mar 27, 2026 8:51pm
staking Ready Ready Preview, Comment Mar 27, 2026 8:51pm

Request Review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b9d88ec4c9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +109 to +111
status: "queued",
},
MAX_CONCURRENT,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Count queued exports when enforcing concurrency limit

This insert uses status: "queued" even though insertExportIfUnderLimit only counts rows where status = 'processing' (src/lib/db.ts), so multiple requests arriving together can all pass the limit check and then all transition to processing after spawnExport. That breaks the advertised MAX_CONCURRENT cap and can start far more than 3 heavy exports at once instead of returning 429s.

Useful? React with 👍 / 👎.

</a>
)}
<a
href={`/api/logs/${exp.id}`}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add logs API route or remove dead dashboard link

The dashboard renders a Logs action for every export, but this URL points to /api/logs/{id} and there is no corresponding handler under src/app/api/logs, so operators always get a 404 when they try to inspect run output. This creates a broken primary action in the main monitoring view.

Useful? React with 👍 / 👎.

url.searchParams.get("offset") ?? "0",
10,
);
const limit = Math.min(Number.isNaN(rawLimit) ? 20 : rawLimit, 100);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clamp exports page size to a non-negative range

The limit parser only caps the upper bound (Math.min(..., 100)) and never enforces a lower bound, so requests like ?limit=-1 are accepted and passed to SQLite. In SQLite, negative LIMIT values disable pagination behavior, which can return the entire table and make this endpoint much more expensive than intended.

Useful? React with 👍 / 👎.

@vercel vercel bot temporarily deployed to Preview – developer-hub March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – proposals March 22, 2026 18:31 Inactive
@vercel vercel bot temporarily deployed to Preview – insights March 22, 2026 18:31 Inactive
devin-ai-integration[bot]

This comment was marked as resolved.

@vercel vercel bot temporarily deployed to Preview – developer-hub March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – insights March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – proposals March 22, 2026 18:52 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library March 22, 2026 18:52 Inactive
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 9 additional findings in Devin Review.

Open in Devin Review

Comment thread apps/pyth-data-puller/src/app/page.tsx
timeBatches = 1;
break;
case "day":
timeBatches = Math.ceil(rangeSec / 86_400);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 estimateFiles ignores config.batchDays for day-mode batching

In estimateFiles, the "minute" case correctly uses config.batchMinutes * 60 to compute time batches, but the "day" case hardcodes 86_400 (one day) instead of using config.batchDays * 86_400. This means if batchDays is ever set to a value other than 1, the file estimate will be wrong (overestimated by a factor of batchDays). Currently this is masked because autoSplit always returns batchDays: 1, but it's inconsistent with the minute-mode pattern and will silently break if batchDays is ever changed.

Suggested change
timeBatches = Math.ceil(rangeSec / 86_400);
timeBatches = Math.ceil(rangeSec / (config.batchDays * 86_400));
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@vercel vercel bot temporarily deployed to Preview – component-library March 22, 2026 19:07 Inactive
@vercel vercel bot temporarily deployed to Preview – developer-hub March 22, 2026 19:07 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 22, 2026 19:07 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference March 22, 2026 19:07 Inactive
@vercel vercel bot temporarily deployed to Preview – proposals March 23, 2026 18:42 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 23, 2026 18:42 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer March 23, 2026 18:42 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – insights March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – developer-hub March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer March 26, 2026 15:17 Inactive
@vercel vercel bot temporarily deployed to Preview – proposals March 26, 2026 15:17 Inactive
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 15 additional findings in Devin Review.

Open in Devin Review

`BATCH_OUTPUT_MODE="split"`,
`FEED_GROUP_SIZE=${config.split.feedGroupSize}`,
`OUTPUT_DEFAULT="export.csv"`,
`GENERATE_INDEX_HTML=1`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 GENERATE_INDEX_HTML=1 contradicts documented workaround, causing exports to fail

The buildEnvConfig hardcodes GENERATE_INDEX_HTML=1 (line 71), but docs/S3_INDEX_HTML_ISSUE.md:43-48 explicitly states the current workaround is GENERATE_INDEX_HTML=0 because the RawBLOB upload "consistently fails with Access Denied". The docs also state (lines 81-84) that GENERATE_INDEX_HTML=1 should only be re-enabled after the IAM issue is resolved.

With GENERATE_INDEX_HTML=1, the bash script's write_index_html function is invoked after successful CSV export. The first RawBLOB upload attempt fails (caught by if ! at scripts/run_lazer_export_s3.sh:490), but the retry at scripts/run_lazer_export_s3.sh:504 has no error handling and runs under set -euo pipefail (line 2), causing the script to exit with non-zero status. The Node.js child.on("exit") handler at src/lib/export-runner.ts:256 then marks the export as "failed" — even though all CSV data was successfully exported to S3.

Suggested change
`GENERATE_INDEX_HTML=1`,
`GENERATE_INDEX_HTML=0`,
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment thread apps/pyth-data-puller/scripts/run_lazer_export_s3.sh Outdated
@vercel vercel bot temporarily deployed to Preview – developer-hub March 26, 2026 16:17 Inactive
@vercel vercel bot temporarily deployed to Preview – entropy-explorer March 26, 2026 16:17 Inactive
@vercel vercel bot temporarily deployed to Preview – api-reference March 26, 2026 16:17 Inactive
@vercel vercel bot temporarily deployed to Preview – proposals March 26, 2026 16:17 Inactive
@vercel vercel bot temporarily deployed to Preview – staking March 26, 2026 16:17 Inactive
@vercel vercel bot temporarily deployed to Preview – component-library March 26, 2026 16:17 Inactive
aditya520 and others added 12 commits March 27, 2026 16:18
Internal tool to replace Retool for historical Pyth Lazer price data exports.
Architecture: Next.js on Tailscale server, shelling out to existing CH→S3
export scripts, with SQLite for request tracking and a self-serve dashboard.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove PLAN.md from the repo
- Fix min_channel type mismatch (string, not number)
- Consolidate channel constants into shared channels.ts
- Single Feed type exported from validate.ts
- Atomic concurrent export limit via SQLite transaction
- Sanitize .env values for bash safety
- Zod validation on external feed API response
- Minimal env passed to child process
- Redact credentials from error messages
- Fix O(n*m) feed lookups with Map
- Replace magic number 13 with VALID_COLUMNS.length
- Remove dead code (initDb, markStuckAsFailed exports)
- Add SQLite indexes on status and created_at
- NaN-safe pagination, 400 for malformed JSON

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix false "no data found" warning for non-batched exports (add file
  count output to BATCH_MODE=none path in bash script + update regex)
- Deduplicate columns before checking "all" to prevent bypass
- Count both 'queued' and 'processing' in concurrent limit check
- Clamp pagination limit to minimum 1 (prevent negative LIMIT)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The /api/logs/[id] route existed on disk but was never committed
because .gitignore had 'logs/' which matched any logs directory.
Changed to '/logs/' to only ignore the root-level export logs dir.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
If spawnExport fails (disk full, log dir unwritable, etc.), the catch
block now marks the already-inserted DB row as 'failed' so it doesn't
permanently consume a concurrency slot.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
ClickHouse IAM role can write CSVWithNames but gets Access Denied on
RawBLOB uploads. Disabled GENERATE_INDEX_HTML for now. Dashboard
links to S3 prefix URL instead of manifest.

Added docs/S3_INDEX_HTML_ISSUE.md explaining the problem and 4 fix options.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Re-enabled GENERATE_INDEX_HTML=1 with export metadata (name, channel,
feed labels). Even if the upload fails with 403, the script continues
and the CSVs are still exported. When it works, the index.html will
be downloadable from S3.

Reverted child process to full env — the minimal env may have been
stripping variables needed by the ClickHouse CLI or Python runtime.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants