Skip to content

ci: remove dataset downloads from Python test scripts#1458

Open
ramakrishnap-nv wants to merge 6 commits into
mainfrom
ci/remove-python-test-dataset-downloads
Open

ci: remove dataset downloads from Python test scripts#1458
ramakrishnap-nv wants to merge 6 commits into
mainfrom
ci/remove-python-test-dataset-downloads

Conversation

@ramakrishnap-nv

@ramakrishnap-nv ramakrishnap-nv commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Datasets needed by Python and server tests are now committed to the repo or replaced with already-committed alternatives, eliminating all network downloads from CI test scripts.

What changes:

  • Commit datasets/solomon/In/r107.txt (7.5 KB Solomon VRPTW benchmark) for test_solomon()
  • Replace a2864.mps (53 MB compressed) and square41.mps (19 MB compressed) with the already-committed afiro_original.mps in all three warm-start tests (test_lp_solver.py, test_python_API.py, test_pdlp_warmstart.py). The PDLP iteration invariant assertion is dropped — it is already covered by cpp/tests/linear_programming/pdlp_test.cu:1844 across 7 instances including a2864; the Python tests now only verify warm-start API behavior.
  • Remove download_pdlp_test_dataset.sh, download_miplib_test_dataset.sh, and get_test_data.sh calls from ci/test_python.sh, ci/test_wheel_cuopt.sh, and ci/test_wheel_cuopt_server.sh. MIPLIB tests already use swath1.mps and neos5-free-bound.mps, both of which are committed.
  • Remove dead --tsp download and run from ci/test_wheel_cuopt.sh.
  • Add .gitignore exceptions so datasets/solomon/In/r107.txt is tracked.

Savings: 2-3 minutes of download time and no external network dependency per CI run.

All datasets needed by Python and server tests are now committed to the
repo or have been replaced with already-committed files.

- Commit datasets/solomon/In/r107.txt (7.5 KB) for test_solomon()
- Update warm-start tests to use committed afiro_original.mps instead
  of a2864.mps (53 MB) and square41.mps (19 MB); drop the PDLP
  iteration invariant assertion (covered by C++ pdlp_test.cu:1844)
- Remove download_pdlp_test_dataset.sh, download_miplib_test_dataset.sh,
  and get_test_data.sh calls from ci/test_python.sh,
  ci/test_wheel_cuopt.sh, and ci/test_wheel_cuopt_server.sh
- Remove dead --tsp code from ci/test_wheel_cuopt.sh
- Add .gitignore exceptions for datasets/solomon/In/r107.txt

Saves 5–15 min of download time and reduces network dependency per CI run.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ramakrishnap-nv ramakrishnap-nv requested review from a team as code owners June 24, 2026 19:17
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2b49ea9b-14f3-4906-997f-e03ef9c1714f

📥 Commits

Reviewing files that changed from the base of the PR and between 41df23b and 28c175a.

⛔ Files ignored due to path filters (1)
  • datasets/mip/swath1.mps is excluded by !**/*.mps
📒 Files selected for processing (1)
  • .gitignore
✅ Files skipped from review due to trivial changes (1)
  • .gitignore

📝 Walkthrough

Walkthrough

CI scripts stop downloading datasets during runs and now export RAPIDS_DATASET_ROOT_DIR before tests proceed. Warm-start tests switch to afiro_original.mps for the main path, use good-mps-1.mps for the mismatch case, and drop iteration-count assertions. A new CI utility posts sticky PR comments summarizing failing test jobs.

Changes

Test dataset handling

Layer / File(s) Summary
CI dataset bootstrap
.gitignore, ci/test_python.sh, ci/test_wheel_cuopt.sh, ci/test_wheel_cuopt_server.sh
datasets/solomon/ is ignored, and the CI test scripts remove dataset download/install steps while setting RAPIDS_DATASET_ROOT_DIR before later test execution.
Warm-start tests
python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py, python/cuopt/cuopt/tests/linear_programming/test_python_API.py, python/cuopt_server/cuopt_server/tests/test_pdlp_warmstart.py
The warm-start tests switch to afiro_original.mps, use good-mps-1.mps for the mismatched-input case, and drop iteration-count comparisons while retaining warm-start status and error checks.

PR test summary

Layer / File(s) Summary
PR comment summary utility
ci/utils/pr_test_summary.py
The module filters test jobs, paginates GitHub API results, parses gtest and pytest failures from logs, builds a markdown summary, and posts or updates the matching PR comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: removing dataset downloads from CI Python test scripts.
Description check ✅ Passed The description is directly related to the changeset and accurately summarizes the dataset and CI script updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ci/remove-python-test-dataset-downloads

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py`:
- Around line 622-627: The warm-start test in test_lp_solver currently runs the
second Solver.Solve call without checking its result, so it only validates that
the path executes. Update the test to capture the second solve output and assert
a concrete success/correctness signal from it, using the existing solver.Solve
and settings.set_pdlp_warm_start_data flow to verify the warm-started solution
is numerically correct rather than just error-free.

In `@python/cuopt/cuopt/tests/linear_programming/test_python_API.py`:
- Around line 475-480: The two problem.solve(settings) calls in
test_python_API.py are only checking that execution succeeds, so add explicit
assertions on the returned solve results to validate correctness. Use the
existing test flow around get_pdlp_warm_start_data() and
set_pdlp_warm_start_data() to verify the first and second solve outcomes,
including solution status and any numerical values the test already inspects.
Make the assertions specific to the solve return object or problem state so
regressions in solution quality are caught.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5ded3d8f-6633-4520-aee8-b8265e5ef6dc

📥 Commits

Reviewing files that changed from the base of the PR and between 08cdd82 and 89e9f8c.

⛔ Files ignored due to path filters (1)
  • datasets/solomon/In/r107.txt is excluded by !datasets/**/*.txt
📒 Files selected for processing (7)
  • .gitignore
  • ci/test_python.sh
  • ci/test_wheel_cuopt.sh
  • ci/test_wheel_cuopt_server.sh
  • python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py
  • python/cuopt/cuopt/tests/linear_programming/test_python_API.py
  • python/cuopt_server/cuopt_server/tests/test_pdlp_warmstart.py
💤 Files with no reviewable changes (3)
  • ci/test_wheel_cuopt.sh
  • ci/test_python.sh
  • ci/test_wheel_cuopt_server.sh

Comment thread python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py
Comment thread python/cuopt/cuopt/tests/linear_programming/test_python_API.py
@ramakrishnap-nv ramakrishnap-nv self-assigned this Jun 24, 2026
@ramakrishnap-nv ramakrishnap-nv added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Jun 24, 2026
ramakrishnap-nv and others added 4 commits June 24, 2026 14:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Drop _CRASH_PATTERNS and all crash-detection logic
- Simplify _is_test_job to plain startswith (exact-match fallback unneeded)
- _analyze_job_log returns a list directly instead of (list, crash) tuple
- Remove verbose module/function/class docstrings

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Address CodeRabbit review: both solve calls in test_warm_start were
unchecked, so regressions in solution quality would go undetected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci/utils/pr_test_summary.py`:
- Around line 70-78: The _api helper currently makes a blocking
urllib.request.urlopen call to GitHub without any timeout, so update _api in
pr_test_summary.py to pass a finite timeout through the request path. Use the
existing _api function and its urllib.request.urlopen call as the fix point, and
keep the timeout consistent with the other network calls in this file so a
stalled connection cannot hang the CI step indefinitely.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4017d091-0082-455b-a405-a01a1c2ec84d

📥 Commits

Reviewing files that changed from the base of the PR and between 5bca0f3 and 41df23b.

📒 Files selected for processing (3)
  • ci/utils/pr_test_summary.py
  • python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py
  • python/cuopt/cuopt/tests/linear_programming/test_python_API.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • python/cuopt/cuopt/tests/linear_programming/test_lp_solver.py
  • python/cuopt/cuopt/tests/linear_programming/test_python_API.py

Comment on lines +70 to +78
def _api(path, token, method, payload):
req = urllib.request.Request(
f"https://api.github.com{path}",
data=json.dumps(payload).encode(),
method=method,
headers={**_headers(token), "Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Add an HTTP timeout in _api calls.

Line 77 performs a blocking GitHub API request without a timeout, unlike the other network paths in this file. A stalled connection can hang the CI step indefinitely.

Suggested fix
 def _api(path, token, method, payload):
@@
-    with urllib.request.urlopen(req) as resp:
+    with urllib.request.urlopen(req, timeout=_HTTP_TIMEOUT_SEC) as resp:
         return json.loads(resp.read())
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _api(path, token, method, payload):
req = urllib.request.Request(
f"https://api.github.com{path}",
data=json.dumps(payload).encode(),
method=method,
headers={**_headers(token), "Content-Type": "application/json"},
)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
def _api(path, token, method, payload):
req = urllib.request.Request(
f"https://api.github.com{path}",
data=json.dumps(payload).encode(),
method=method,
headers={**_headers(token), "Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=_HTTP_TIMEOUT_SEC) as resp:
return json.loads(resp.read())
🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 76-76: Request-controlled URL passed to urlopen; validate against an allowlist to prevent SSRF.
Context: urllib.request.urlopen(req)
Note: [CWE-918] Server-Side Request Forgery (SSRF).

(urlopen-unsanitized-data)


[info] 72-72: use jsonify instead of json.dumps for JSON output
Context: json.dumps(payload)
Note: [CWE-116] Improper Encoding or Escaping of Output.

(use-jsonify)

🪛 Ruff (0.15.18)

[error] 77-77: Audit URL open for permitted schemes. Allowing use of file: or custom schemes is often unexpected.

(S310)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci/utils/pr_test_summary.py` around lines 70 - 78, The _api helper currently
makes a blocking urllib.request.urlopen call to GitHub without any timeout, so
update _api in pr_test_summary.py to pass a finite timeout through the request
path. Use the existing _api function and its urllib.request.urlopen call as the
fix point, and keep the timeout consistent with the other network calls in this
file so a stalled connection cannot hang the CI step indefinitely.

swath1.mps (1.7 MB) is used by test_heuristics_only and
test_incumbent_callbacks; it was not committed and was previously
provided by download_miplib_test_dataset.sh which this PR removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant