chore/SOF 7908#329
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
Looking for one thing? Review this PR in Change Stack to search files, summaries, diffs, and code without losing your place. Warning Review limit reached
More reviews will be available in 47 minutes and 22 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
📝 WalkthroughWalkthroughThis pull request refactors notebook examples across the repository to use a unified OIDC-authenticated ChangesNotebook Examples APIClient Migration
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
examples/material/upload_materials_from_file_poscar.ipynb (1)
80-91:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep the
POSCAR_PATHinstructions consistent with the example.Line 80 says
POSCAR_PATHshould be absolute, but Line 91 uses a relative../assets/...path. That mismatch makes the setup instructions misleading for users following the notebook.Suggested change
- - **POSCAR_PATH**: absolute path to the POSCAR file + - **POSCAR_PATH**: path to the POSCAR file (relative to the notebook working directory in this example)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/upload_materials_from_file_poscar.ipynb` around lines 80 - 91, The notebook's parameter instructions are inconsistent: the markdown says POSCAR_PATH must be absolute but the code cell sets a relative path; update either the description or the example so they match — for example, change the markdown line referencing POSCAR_PATH to indicate a relative path is acceptable, or replace the POSCAR_PATH value in the code cell (the variable POSCAR_PATH next to NAME) with an absolute path string; ensure the variable name POSCAR_PATH and the explanatory markdown are consistent.examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb (1)
869-877:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDo not shift the band energies in place.
band_dataaliasesresult["band_structure"]["data"], so rerunning this plotting cell subtracts the Fermi level again and produces progressively wrong plots.Suggested fix
for result in results: if result["band_structure"]: band_data = result["band_structure"]["data"] # adjust for Fermi level fermi_level = result["fermi_level"]["data"]["value"] - for i in range(len(band_data["yDataSeries"])): - band_data["yDataSeries"][i] = [e - fermi_level for e in band_data["yDataSeries"][i]] - - plot_band_structure_with_labels(band_data, ylim=[MIN_E, MAX_E]) + shifted_band_data = { + **band_data, + "yDataSeries": [[e - fermi_level for e in energies] for energies in band_data["yDataSeries"]], + } + + plot_band_structure_with_labels(shifted_band_data, ylim=[MIN_E, MAX_E])🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb` around lines 869 - 877, The code currently mutates result["band_structure"]["data"] in place by assigning band_data = result["band_structure"]["data"] and then subtracting fermi_level from each entry in band_data["yDataSeries"]; instead, create a non-mutating copy of the band data (or build a new shifted_yDataSeries list) and apply the Fermi-level shift to that copy so result remains unchanged, then pass the copied/shifted structure to plot_band_structure_with_labels; refer to result, band_data, fermi_level, and the "yDataSeries" key to locate and update the logic.examples/workflow/qe_scf_calculation.ipynb (1)
191-199:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix
_materialin job payload: use JSON null/None, not the string"null"In
examples/workflow/qe_scf_calculation.ipynbtheJOB_BODYsends"_material": "null", which serializes to a literal string rather than JSONnull. For material-less jobs,_materialshould be omitted or set toNoneso it serializes to JSONnull.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/workflow/qe_scf_calculation.ipynb` around lines 191 - 199, The JOB_BODY payload sets "_material" to the string "null", which will serialize as a JSON string; update the JOB_BODY in examples/workflow/qe_scf_calculation.ipynb so that the "_material" entry is either removed entirely or set to Python None (not the string) so it serializes to JSON null; locate the JOB_BODY definition and replace "\"_material\": \"null\"" with either no "_material" key or "\"_material\": None" to fix the serialization.examples/job/create_and_submit_job.ipynb (1)
123-128:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd
_projectto the job creation payload (to match the “default account’s project” claim).
examples/job/create_and_submit_job.ipynbbuildsconfigwithowner,_material,workflow, andname, but omits_project, while the markdown states the job is created inside the default account’s project. The shared helpersrc/py/mat3ra/notebooks_utils/core/entity/job/api.pyincludes_projectin the payload passed toapi_client.jobs.create(...), and other job examples explicitly fetch a defaultproject_idviaclient.projects.list({ "isDefault": True, "owner._id": OWNER_ID })[0]["_id"]and pass it into job creation helpers.Suggested fix
+default_project = client.projects.list({"isDefault": True, "owner._id": OWNER_ID})[0] + config = { "owner": {"_id": OWNER_ID}, + "_project": {"_id": default_project["_id"]}, "_material": {"_id": material_id}, "workflow": {"_id": workflow_id}, "name": JOB_NAME, }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/job/create_and_submit_job.ipynb` around lines 123 - 128, The job creation payload `config` in the notebook is missing the `_project` field required to create the job in the default account project; update the notebook to fetch the default project id (e.g., via `client.projects.list({ "isDefault": True, "owner._id": OWNER_ID })[0]["_id"]`) and add `"_project": {"_id": project_id}` to `config` before calling the job creation helper so the payload aligns with `notebooks_utils/core/entity/job/api.py` and the `api_client.jobs.create(...)` usage.examples/job/run-simulations-and-extract-properties.ipynb (2)
68-80:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRe-add
json; the compute cell still parsesCLUSTERS.The later
json.loads(os.getenv("CLUSTERS"))call now crashes withNameErrorbecause this import block no longer bringsjsonin.🛠️ Suggested fix
+import json import time from IPython.display import IFrame🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/job/run-simulations-and-extract-properties.ipynb` around lines 68 - 80, The import block is missing the json module which causes json.loads(os.getenv("CLUSTERS")) to raise NameError later; add an import for the json module (e.g., import json) alongside the existing imports in the top cell that contains wait_for_jobs_to_finish_async, get_property_by_subworkflow_and_unit_indicies, dataframe_to_html and flatten_material so json.loads and os.getenv("CLUSTERS") work correctly.
417-422:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the row-building loop; it currently drops rows and duplicates the initial structure.
Only Line 419 is inside the loop right now, so the cell appends a single row for the last result. It also fills the
FIN-*columns frominitial_structureagain instead offinal_structure.🛠️ Suggested fix
table = [] for result in results: data = flatten_material(result["initial_structure"]) -data.extend(flatten_material(result["initial_structure"])) -data.extend([result["pressure"], result["band_gap_direct"], result["band_gap_indirect"]]) -table.append(data) + data.extend(flatten_material(result["final_structure"])) + data.extend([result["pressure"], result["band_gap_direct"], result["band_gap_indirect"]]) + table.append(data)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/job/run-simulations-and-extract-properties.ipynb` around lines 417 - 422, The loop body only contains the first line, causing only the last result to be appended and duplicating initial_structure for FIN-* columns; fix by moving all row-building statements into the for loop so each iteration does: create data = flatten_material(result["initial_structure"]), then extend it with flatten_material(result["final_structure"]) (not initial_structure again), then extend with result["pressure"], result["band_gap_direct"], result["band_gap_indirect"], and finally append data to table so every result produces one row.examples/reproducing_publications/band_gaps_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb (1)
725-729:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFix the notebook cell to call the correct job-wait helper and match its signature
In
examples/reproducing_publications/band_gaps_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb(lines 725-729), the cell importswait_for_jobs_to_finishbut onlywait_for_jobs_to_finish_async(endpoint, job_ids)exists/is exported insrc/py/mat3ra/notebooks_utils/api/job.py. The cell then callswait_for_jobs_to_finish_async(...)without importing it (would fail withNameErroronce the import is corrected), and the call also passespoll_interval=60even though the helper signature does not acceptpoll_interval(would fail withTypeError).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/reproducing_publications/band_gaps_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb` around lines 725 - 729, The notebook imports wait_for_jobs_to_finish but the available helper is wait_for_jobs_to_finish_async; update the import to "from mat3ra.notebooks_utils.api.job import wait_for_jobs_to_finish_async" and call it using the correct signature (await wait_for_jobs_to_finish_async(client.jobs, job_ids)) — remove the unsupported poll_interval=60 argument and ensure you pass the client.jobs and job_ids variables as shown.
🧹 Nitpick comments (2)
examples/material/create_material.ipynb (1)
136-137: ⚡ Quick winUse the named
owner_idargument in the create call.Line 137 is the only migrated material-create example here that still passes the owner as a bare second positional argument. The in-repo helper in
src/py/mat3ra/notebooks_utils/core/entity/material/api.py:4-23usesowner_id=..., so keeping this positional form makes the notebook depend on the external client's parameter order.Suggested change
- material = client.materials.create(CONFIG, OWNER_ID) + material = client.materials.create(CONFIG, owner_id=OWNER_ID)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/material/create_material.ipynb` around lines 136 - 137, The call to client.materials.create passes the owner as a second positional argument (CONFIG, OWNER_ID); change it to use the named parameter owner_id so the call becomes client.materials.create(CONFIG, owner_id=OWNER_ID) to avoid depending on parameter order — update the invocation of client.materials.create and ensure OWNER_ID remains the same variable used for owner_id.examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb (1)
724-747: ⚡ Quick winAvoid hard-coding workflow indexes for property lookup.
These
subworkflows[1]["units"][0/1]lookups tie the notebook to the current bank-workflow layout.src/py/mat3ra/notebooks_utils/core/entity/property/api.pyalready has helpers that resolve the Fermi-energy flowchart from the job, which would make this example survive workflow-template changes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb` around lines 724 - 747, The loop is hard-coded to job["workflow"]["subworkflows"][1]["units"][0/1] via unit_flowchart_id_0 and unit_flowchart_id_1 which will break if the workflow template changes; replace those index lookups by calling the existing helper that resolves the correct unit/flowchart for a given property (use it to fetch the flowchart id for "fermi_energy" and "band_structure") and pass that id to client.properties.get_property instead of unit_flowchart_id_0/1; update the code that computes unit_flowchart_id_0 and unit_flowchart_id_1 (and any references) to use the helper so the loop using job and client.properties.get_property becomes resilient to workflow layout changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/job/get-file-from-job.ipynb`:
- Line 9: Update the notebook intro to reflect the migrated auth flow: replace
the instruction to update ../../utils/settings.json with a note that
authentication now uses authenticate() / APIClient.authenticate() and show where
to call it, and fix the link text/target to point to create_and_submit_job.ipynb
(singular) instead of create_and_submit_jobs.ipynb so readers are directed to
the correct sibling example.
- Around line 105-115: The notebook computes material_id from a copied bank
material but then calls client.jobs.create_by_ids using materials =
client.materials.list({"owner._id": owner_id}) which ignores material_id; update
the call so it only passes the copied material (either filter the results of
client.materials.list to the copied material_id or construct a single-item
list/dict for the copied material using material_id) and then pass that
filtered/constructed materials list into client.jobs.create_by_ids to ensure the
job uses the intended material_id (references: material_id, bank_materials,
bank_material_id, client.materials.list, client.jobs.create_by_ids).
In `@examples/job/ml-train-model-predict-properties.ipynb`:
- Around line 46-52: The notebook is missing an import for json which causes a
NameError when evaluating cluster_config =
next(iter(json.loads(os.getenv("CLUSTERS"))), {}); add "import json" alongside
the other imports at the top of the file so json.loads can be used, ensuring
CLUSTERS parsing works and cluster_config is properly created.
In `@examples/job/run-simulations-and-extract-properties.ipynb`:
- Around line 385-389: The band-gap extraction is using the flowchart ID from
subworkflow index 0 (relaxation) instead of the band-gap subworkflow (index 1);
update how unit_flowchart_id is derived so client.properties.get_direct_band_gap
and get_indirect_band_gap receive the flowchartId from
job["workflow"]["subworkflows"][1]["units"][1] (i.e., use subworkflows[1] rather
than subworkflows[0]) so the calls to
client.properties.get_direct_band_gap(job["_id"], unit_flowchart_id) and
client.properties.get_indirect_band_gap(...) target the correct subworkflow.
In `@examples/material/upload_materials_from_file_poscar.ipynb`:
- Around line 18-27: The notebook text claims POSCAR_PATH is absolute but the
code sets POSCAR_PATH = "../assets/mp-978534.poscar"; change the documentation
to state POSCAR_PATH is a relative path (or update POSCAR_PATH to an absolute
path) so the docstring and variable agree, and ensure any readers know which
behavior you choose; also avoid top-level await in the notebook to prevent
lint/export issues by wrapping calls like await install_packages(...) and await
authenticate() inside an async function (e.g., main) and calling it via
asyncio.run or similar, or document the lint risk (F704/PLE1142) if you
intentionally keep top-level await. Include references to POSCAR_PATH,
install_packages, and authenticate when making these changes.
In
`@examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb`:
- Around line 702-706: The notebook imports wait_for_jobs_to_finish but then
awaits an undefined wait_for_jobs_to_finish_async; fix by using the imported
helper or importing the async variant: either replace await
wait_for_jobs_to_finish_async(client.jobs, job_ids, poll_interval=60) with await
wait_for_jobs_to_finish(client.jobs, job_ids, poll_interval=60), or update the
import to bring in wait_for_jobs_to_finish_async and ensure it accepts
(client.jobs, job_ids, poll_interval=60) so the awaited call matches a defined
symbol.
In `@examples/workflow/get_workflows.ipynb`:
- Around line 18-27: Top-level await calls (install_packages and authenticate)
must be converted so the notebook can be linted as plain Python: replace lines
using "await install_packages('api')" and "await authenticate()" with a
synchronous entry (import asyncio) and either wrap them in an async def main()
and call asyncio.run(main()) or call asyncio.run(install_packages("api")) /
asyncio.run(authenticate()); ensure you add "import asyncio" and keep the
original function names (install_packages, authenticate) unchanged.
In `@examples/workflow/qe_scf_calculation.ipynb`:
- Around line 43-49: Add an import for the json module at the top of the
notebook so json is available when the compute setup runs; specifically, add
"import json" alongside the existing "import os" before the code that calls
json.loads(os.getenv("CLUSTERS")) (near where APIClient.authenticate(), client,
selected_account and OWNER_ID are defined) to prevent the NameError.
- Around line 245-246: The notebook awaits wait_for_jobs_to_finish_async but
never imports it, causing a NameError; add an import for
wait_for_jobs_to_finish_async at the top of the notebook from the module that
provides the job helper utilities (the same place other job helpers/clients are
imported from) so the symbol is defined before calling await
wait_for_jobs_to_finish_async(client.jobs, [JOB_RESP["_id"]]).
---
Outside diff comments:
In `@examples/job/create_and_submit_job.ipynb`:
- Around line 123-128: The job creation payload `config` in the notebook is
missing the `_project` field required to create the job in the default account
project; update the notebook to fetch the default project id (e.g., via
`client.projects.list({ "isDefault": True, "owner._id": OWNER_ID })[0]["_id"]`)
and add `"_project": {"_id": project_id}` to `config` before calling the job
creation helper so the payload aligns with
`notebooks_utils/core/entity/job/api.py` and the `api_client.jobs.create(...)`
usage.
In `@examples/job/run-simulations-and-extract-properties.ipynb`:
- Around line 68-80: The import block is missing the json module which causes
json.loads(os.getenv("CLUSTERS")) to raise NameError later; add an import for
the json module (e.g., import json) alongside the existing imports in the top
cell that contains wait_for_jobs_to_finish_async,
get_property_by_subworkflow_and_unit_indicies, dataframe_to_html and
flatten_material so json.loads and os.getenv("CLUSTERS") work correctly.
- Around line 417-422: The loop body only contains the first line, causing only
the last result to be appended and duplicating initial_structure for FIN-*
columns; fix by moving all row-building statements into the for loop so each
iteration does: create data = flatten_material(result["initial_structure"]),
then extend it with flatten_material(result["final_structure"]) (not
initial_structure again), then extend with result["pressure"],
result["band_gap_direct"], result["band_gap_indirect"], and finally append data
to table so every result produces one row.
In `@examples/material/upload_materials_from_file_poscar.ipynb`:
- Around line 80-91: The notebook's parameter instructions are inconsistent: the
markdown says POSCAR_PATH must be absolute but the code cell sets a relative
path; update either the description or the example so they match — for example,
change the markdown line referencing POSCAR_PATH to indicate a relative path is
acceptable, or replace the POSCAR_PATH value in the code cell (the variable
POSCAR_PATH next to NAME) with an absolute path string; ensure the variable name
POSCAR_PATH and the explanatory markdown are consistent.
In
`@examples/reproducing_publications/band_gaps_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb`:
- Around line 725-729: The notebook imports wait_for_jobs_to_finish but the
available helper is wait_for_jobs_to_finish_async; update the import to "from
mat3ra.notebooks_utils.api.job import wait_for_jobs_to_finish_async" and call it
using the correct signature (await wait_for_jobs_to_finish_async(client.jobs,
job_ids)) — remove the unsupported poll_interval=60 argument and ensure you pass
the client.jobs and job_ids variables as shown.
In
`@examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb`:
- Around line 869-877: The code currently mutates
result["band_structure"]["data"] in place by assigning band_data =
result["band_structure"]["data"] and then subtracting fermi_level from each
entry in band_data["yDataSeries"]; instead, create a non-mutating copy of the
band data (or build a new shifted_yDataSeries list) and apply the Fermi-level
shift to that copy so result remains unchanged, then pass the copied/shifted
structure to plot_band_structure_with_labels; refer to result, band_data,
fermi_level, and the "yDataSeries" key to locate and update the logic.
In `@examples/workflow/qe_scf_calculation.ipynb`:
- Around line 191-199: The JOB_BODY payload sets "_material" to the string
"null", which will serialize as a JSON string; update the JOB_BODY in
examples/workflow/qe_scf_calculation.ipynb so that the "_material" entry is
either removed entirely or set to Python None (not the string) so it serializes
to JSON null; locate the JOB_BODY definition and replace "\"_material\":
\"null\"" with either no "_material" key or "\"_material\": None" to fix the
serialization.
---
Nitpick comments:
In `@examples/material/create_material.ipynb`:
- Around line 136-137: The call to client.materials.create passes the owner as a
second positional argument (CONFIG, OWNER_ID); change it to use the named
parameter owner_id so the call becomes client.materials.create(CONFIG,
owner_id=OWNER_ID) to avoid depending on parameter order — update the invocation
of client.materials.create and ensure OWNER_ID remains the same variable used
for owner_id.
In
`@examples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynb`:
- Around line 724-747: The loop is hard-coded to
job["workflow"]["subworkflows"][1]["units"][0/1] via unit_flowchart_id_0 and
unit_flowchart_id_1 which will break if the workflow template changes; replace
those index lookups by calling the existing helper that resolves the correct
unit/flowchart for a given property (use it to fetch the flowchart id for
"fermi_energy" and "band_structure") and pass that id to
client.properties.get_property instead of unit_flowchart_id_0/1; update the code
that computes unit_flowchart_id_0 and unit_flowchart_id_1 (and any references)
to use the helper so the loop using job and client.properties.get_property
becomes resilient to workflow layout changes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9b0e3d94-b00d-4cbd-9638-2e726e23d83e
📒 Files selected for processing (16)
README.mdexamples/job/create_and_submit_job.ipynbexamples/job/get-file-from-job.ipynbexamples/job/ml-train-model-predict-properties.ipynbexamples/job/run-simulations-and-extract-properties.ipynbexamples/material/create_material.ipynbexamples/material/get_materials_by_formula.ipynbexamples/material/upload_materials_from_file_poscar.ipynbexamples/reproducing_publications/band_gaps_for_interface_bilayer_twisted_molybdenum_disulfide.ipynbexamples/reproducing_publications/band_structure_for_interface_bilayer_twisted_molybdenum_disulfide.ipynbexamples/system/get_authentication_params.ipynbexamples/workflow/get_workflows.ipynbexamples/workflow/qe_scf_calculation.ipynbpyproject.tomlsrc/py/mat3ra/notebooks_utils/core/api/settings.pysrc/py/mat3ra/notebooks_utils/ipython/_collab.py
💤 Files with no reviewable changes (2)
- src/py/mat3ra/notebooks_utils/ipython/_collab.py
- pyproject.toml
| "<img alt=\"Open in Google Colab\" src=\"https://user-images.githubusercontent.com/20477508/128780728-491fea90-9b23-495f-a091-11681150db37.jpeg\" width=\"150\" border=\"0\">\n", | ||
| "</a>" | ||
| ] | ||
| "source": "# Get-File-From-Job\n\nThis example demonstrates how to use Mat3ra RESTful API to check for and acquire files from jobs which have been run. This example assumes that the user is already familiar with the [creation and submission of jobs](create_and_submit_jobs.ipynb) using our API.\n\n> <span style=\"color: orange\">**IMPORTANT NOTE**</span>: In order to run this example in full, an active Mat3ra.com account is required. Alternatively, Readers may substitute the workflow ID below with another one (an equivalent one for VASP, for example) and adjust extraction of the results (\"Viewing job files\" section). RESTful API credentials shall be updated in [settings](../../utils/settings.json).\n\n\n## Steps\n\nAfter working through this notebook, you will be able to:\n\n1. Import [the structure of Si](https://materialsproject.org/materials/mp-149/) from Materials Bank\n2. Set up and run a single-point calculation using Quantum Espresso.\n3. List files currently in the job's directory\n4. Check metadata for every file (modification date, size, etc)\n5. Access file contents directly and print them to console\n6. Download files to your local machine\n\n## Pre-requisites\n\nThe explanation below assumes that the reader is familiar with the concepts used in Mat3ra platform and RESTful API. We outline these below and direct the reader to the original sources of information:\n\n- [Generating RESTful API authentication parameters](../system/get_authentication_params.ipynb)\n- [Creating and submitting jobs](../job/create_and_submit_job.ipynb)" |
There was a problem hiding this comment.
Update the intro to match the migrated auth flow.
This block still tells readers to update ../../utils/settings.json, even though the notebook now authenticates via authenticate() / APIClient.authenticate(). It also links to create_and_submit_jobs.ipynb, while the sibling notebook in this PR is create_and_submit_job.ipynb.
🧰 Tools
🪛 Ruff (0.15.15)
[error] 9-9: await statement outside of a function
(F704)
[error] 9-9: await should be used within an async function
(PLE1142)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/job/get-file-from-job.ipynb` at line 9, Update the notebook intro to
reflect the migrated auth flow: replace the instruction to update
../../utils/settings.json with a note that authentication now uses
authenticate() / APIClient.authenticate() and show where to call it, and fix the
link text/target to point to create_and_submit_job.ipynb (singular) instead of
create_and_submit_jobs.ipynb so readers are directed to the correct sibling
example.
| "# Get materials from bank and copy one to our account\n", | ||
| "material_bank_endpoints = BankMaterialEndpoints(*ENDPOINT_ARGS)\n", | ||
| "MATERIAL_QUERY = {\"formula\": \"Si\"}\n", | ||
| "bank_materials = material_bank_endpoints.list(MATERIAL_QUERY)\n", | ||
| "bank_materials = client.bank_materials.list(MATERIAL_QUERY)\n", | ||
| "bank_material_id = bank_materials[0][\"_id\"]\n", | ||
| "material_endpoints = MaterialEndpoints(*ENDPOINT_ARGS)\n", | ||
| "material_id = material_bank_endpoints.copy(bank_material_id, owner_id)[\"_id\"]\n", | ||
| "materials = material_endpoints.list({\"owner._id\": owner_id})\n", | ||
| "material_id = client.bank_materials.copy(bank_material_id, owner_id)[\"_id\"]\n", | ||
| "materials = client.materials.list({\"owner._id\": owner_id})\n", | ||
| "\n", | ||
| "# Create the job\n", | ||
| "job_endpoints = JobEndpoints(*ENDPOINT_ARGS)\n", | ||
| "job = job_endpoints.create_by_ids(\n", | ||
| "job = client.jobs.create_by_ids(\n", | ||
| " materials=materials, workflow_id=workflow_id, project_id=project_id, owner_id=owner_id, prefix=\"Test_Job_Output\"\n", | ||
| ")[0]\n", |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate the notebook and the job API helper
echo "== Files containing create_by_ids =="
rg -n "create_by_ids" -S .
echo "== Job API helper (create_job) =="
rg -n "def create_job" -S src examples .
# Print the relevant helper implementation if found
python3 - <<'PY'
import subprocess, re, os, sys, textwrap, json, pathlib
# Find candidate files for create_job
out = subprocess.check_output(["bash","-lc","rg -n \"def create_job\" -S src examples || true"], text=True)
files=set()
for line in out.splitlines():
m=re.match(r"([^:]+):\d+:def create_job", line)
if m: files.add(m.group(1))
print("Candidate create_job files:", sorted(files))
for f in sorted(files):
# Print a window around the function definition
# Use sed to show ~120 lines around matches
# Determine line number(s)
out2 = subprocess.check_output(["bash","-lc", f"rg -n \"def create_job\" -S {shlex.quote(f)} || true"], text=True)
PYRepository: Exabyte-io/api-examples
Length of output: 1349
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search create_by_ids definition =="
rg -n "def create_by_ids|create_by_ids\(" src/py -S
echo "== Show src/py/mat3ra/notebooks_utils/core/entity/job/api.py around create_job =="
python3 - <<'PY'
import pathlib, itertools
path = pathlib.Path("src/py/mat3ra/notebooks_utils/core/entity/job/api.py")
text = path.read_text(encoding="utf-8").splitlines()
for i,line in enumerate(text, start=1):
if "def create_job" in line:
start=max(1, i-15)
end=min(len(text), i+120)
for j in range(start,end+1):
print(f"{j:5d}: {text[j-1]}")
break
PY
echo "== Show examples/job/get-file-from-job.ipynb around materials construction and create_by_ids call =="
python3 - <<'PY'
import json, pathlib
p = pathlib.Path("examples/job/get-file-from-job.ipynb")
nb = json.loads(p.read_text(encoding="utf-8"))
# Find the cell(s) containing create_by_ids
hits=[]
for idx,cell in enumerate(nb.get("cells",[])):
src="".join(cell.get("source",[]))
if "create_by_ids" in src:
hits.append((idx, src))
print("Cells containing create_by_ids:", [h[0] for h in hits])
# Print the first hit region with some context from the previous/next cell
if hits:
i = hits[0][0]
for k in range(max(0,i-2), min(len(nb["cells"]), i+3)):
cell=nb["cells"][k]
src="".join(cell.get("source",[]))
if src.strip():
print("\n--- cell", k, "---")
print(src)
PYRepository: Exabyte-io/api-examples
Length of output: 105
🏁 Script executed:
#!/bin/bash
set -u
echo "== Search create_by_ids occurrences in src/py =="
rg -n "create_by_ids" src/py -S || true
echo "== Show create_job implementation (with line numbers) =="
python3 - <<'PY'
import pathlib
path = pathlib.Path("src/py/mat3ra/notebooks_utils/core/entity/job/api.py")
text = path.read_text(encoding="utf-8").splitlines()
for i,line in enumerate(text, start=1):
if "def create_job" in line:
start=max(1,i-20)
end=min(len(text), i+220)
for j in range(start,end+1):
print(f"{j:5d}: {text[j-1]}")
break
PY
echo "== Dump the exact notebook cell(s) that call client.jobs.create_by_ids =="
python3 - <<'PY'
import json, pathlib
p = pathlib.Path("examples/job/get-file-from-job.ipynb")
nb = json.loads(p.read_text(encoding="utf-8"))
for idx,cell in enumerate(nb.get("cells",[])):
src = "".join(cell.get("source",[]))
if "client.jobs.create_by_ids" in src:
print(f"\n--- cell index {idx} ---\n{src}")
PYRepository: Exabyte-io/api-examples
Length of output: 4200
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Search whole repo for create_by_ids implementation =="
rg -n "def create_by_ids|create_by_ids\s*\(" -S . || true
echo "== Search for jobs client class/module =="
rg -n "class .*Jobs|jobs\.create_by_ids|create_by_ids" -S src || true
echo "== List likely job client files =="
fd -t f -S "job" src/py || trueRepository: Exabyte-io/api-examples
Length of output: 1107
Pass only the copied Si material into client.jobs.create_by_ids(...)
In examples/job/get-file-from-job.ipynb (lines 105-115), the notebook computes material_id from the copied silicon material, but the create_by_ids(...) call uses materials = client.materials.list({"owner._id": owner_id}) (all materials for the account), leaving the copied material_id unused and risking the wrong _id being used (the create_job(...) helper sets _material from materials[0]["_id"]). Filter materials to the copied _id or pass a single-item list containing the copied material dict.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/job/get-file-from-job.ipynb` around lines 105 - 115, The notebook
computes material_id from a copied bank material but then calls
client.jobs.create_by_ids using materials = client.materials.list({"owner._id":
owner_id}) which ignores material_id; update the call so it only passes the
copied material (either filter the results of client.materials.list to the
copied material_id or construct a single-item list/dict for the copied material
using material_id) and then pass that filtered/constructed materials list into
client.jobs.create_by_ids to ensure the job uses the intended material_id
(references: material_id, bank_materials, bank_material_id,
client.materials.list, client.jobs.create_by_ids).
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Overview\n", | ||
| "from mat3ra.notebooks_utils.packages import install_packages\n", | ||
| "\n", | ||
| "This example demonstrates how to import a material from a POSCAR file via [Material](https://docs.mat3ra.com/api/Material/post_materials_import) endpoints." | ||
| "await install_packages(\"api\")" | ||
| ] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect the notebook JSON for top-level await and referenced POSCAR paths.
NOTEBOOK="examples/material/upload_materials_from_file_poscar.ipynb"
if [ ! -f "$NOTEBOOK" ]; then
echo "Missing $NOTEBOOK"
exit 1
fi
python3 - <<'PY'
import json
import re
from pathlib import Path
nb_path = Path("examples/material/upload_materials_from_file_poscar.ipynb")
nb = json.loads(nb_path.read_text(encoding="utf-8"))
await_lines = []
poscar_lines = []
source_cell_idx = 0
for cell in nb.get("cells", []):
if cell.get("cell_type") != "code":
continue
src = cell.get("source", [])
if isinstance(src, str):
src = src.splitlines(True)
for i, line in enumerate(src, start=1):
# rough checks for "await" usage in source strings
if re.search(r'(^|\s)await\s', line):
await_lines.append((source_cell_idx, i, line.rstrip("\n")))
if "POSCAR" in line or "poscar" in line:
poscar_lines.append((source_cell_idx, i, line.rstrip("\n")))
source_cell_idx += 1
print("=== Top-level-ish await occurrences (cell_index, line_in_cell, line) ===")
for cidx, lno, text in await_lines[:200]:
print(f"cell={cidx} line={lno}: {text}")
print(f"... total await lines: {len(await_lines)}")
print("\n=== POSCAR/poscar occurrences (cell_index, line_in_cell, line) ===")
for cidx, lno, text in poscar_lines[:200]:
print(f"cell={cidx} line={lno}: {text}")
print(f"... total poscar lines: {len(poscar_lines)}")
PY
# 2) Locate CI linting/export steps for notebooks (nbqa, ruff on .ipynb, etc.)
echo "=== Searching for notebook lint/export in CI ==="
rg -n --hidden --glob '.github/workflows/*' -S "nbqa|ruff.*ipynb|ipynb.*ruff|ruff.*nb|lint.*ipynb|export.*notebook|jupytext|convert.*ipynb|nbconvert" . || true
echo "=== Searching for ruff config targeting notebooks ==="
ls -la .ruff* 2>/dev/null || true
rg -n --hidden -S "ipynb" pyproject.toml ruff.toml setup.cfg . || true
# 3) Also search for the specific Ruff codes mentioned (F704/PLE1142) across the repo.
echo "=== Searching for ruff F704/PLE1142 references ==="
rg -n --hidden -S "F704|PLE1142" . || trueRepository: Exabyte-io/api-examples
Length of output: 50447
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Focused CI searches
echo "=== Search workflows for ruff/nbqa/jupytext/nbconvert/ipynb ==="
rg -n --hidden -S "\.ipynb|nbqa|jupytext|nbconvert|ruff" .github/workflows || true
echo "=== Search for nbconvert render/lint scripts in repo root ==="
rg -n --hidden -S "nbqa|jupytext|nbconvert|ruff.*ipynb|ipynb.*ruff|inspect.*ipynb" . || true
# Focused Ruff config extraction from pyproject.toml
echo "=== Ruff config sections from pyproject.toml ==="
python3 - <<'PY'
import re, pathlib
p = pathlib.Path("pyproject.toml")
if not p.exists():
print("pyproject.toml not found")
raise SystemExit(0)
txt = p.read_text(encoding="utf-8")
# Print only likely ruff config block lines
for m in re.finditer(r'(?ms)^\[tool\.ruff[^\]]*\].*?(?=^\[tool\.|\Z)', txt):
block = m.group(0)
if any(k in block for k in ["ipynb","extend-include","include","exclude","lint","select","ignore","F704","PLE1142"]):
print(block)
PY
# Search for explicit ruff F704/PLE1142 mentions
echo "=== Search for F704/PLE1142 in repo ==="
rg -n --hidden -S "F704|PLE1142" . || trueRepository: Exabyte-io/api-examples
Length of output: 9847
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== List .github workflows ==="
ls -ლა .github/workflows || true
echo
echo "=== Search CI for render-notebooks / nbconvert / ruff / nbqa / jupytext ==="
rg -n --hidden -S "render-notebooks\.sh|nbconvert|jupytext|nbqa|ruff|jupyter-nbconvert" .github/workflows . || true
echo
echo "=== Show render-notebooks.sh (key parts) ==="
sed -n '1,120p' scripts/render-notebooks.sh
echo
sed -n '120,240p' scripts/render-notebooks.sh || true
echo
echo "=== Inspect examples/config.py (key parts) ==="
sed -n '1,120p' examples/config.py
echo
sed -n '120,240p' examples/config.py || true
echo
echo "=== Check pyproject for any notebook lint/export integration ==="
rg -n --hidden -S "ipynb|nbconvert|nbqa|jupytext|jupyter-nbconvert|jupytext|ruff.*ipynb" pyproject.toml || true
echo
sed -n '1,220p' pyproject.toml
echo
echo "=== Search the target notebook for POSCAR_PATH / absolute wording ==="
NOTEBOOK="examples/material/upload_materials_from_file_poscar.ipynb"
python3 - <<'PY'
import json, re
from pathlib import Path
nb_path = Path("examples/material/upload_materials_from_file_poscar.ipynb")
nb = json.loads(nb_path.read_text(encoding="utf-8"))
patterns = [r"POSCAR_PATH", r"absolute", r"absolute path", r"relative", r"../assets/.*poscar"]
for cell in nb.get("cells", []):
if cell.get("cell_type") not in ("markdown","code"):
continue
src = cell.get("source", [])
if isinstance(src, str):
src = src.splitlines(True)
text = "".join(src)
if any(re.search(p, text, flags=re.IGNORECASE) for p in patterns):
print("\n--- cell_type:", cell.get("cell_type"), "---")
# print first ~12 lines for context
lines = text.splitlines()
for l in lines[:12]:
print(l)
PYRepository: Exabyte-io/api-examples
Length of output: 17460
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== Check .github contents ==="
ls -la .github || true
find .github -maxdepth 3 -type f -print || true
echo "=== Inspect pre-commit config for ruff ==="
ls -la .pre-commit-config.yaml || true
sed -n '1,200p' .pre-commit-config.yaml || true
echo "=== Search for ruff invocation in scripts/ci ==="
rg -n --hidden -S "pre-commit|ruff check|ruff\s|nbqa|jupytext|jupyter-nbconvert|nbconvert.*python" . || true
echo "=== Search for any CI configs besides GitHub workflows ==="
ls -la .gitlab-ci.yml .circleci/config.yml .travis.yml 2>/dev/null || true
find . -maxdepth 4 -type f -iname '*ci*' -o -iname '*pipeline*' -o -iname '*workflow*' | head -n 50 || true
echo "=== Extract the relevant cells from the target notebook (code+markdown around POSCAR_PATH/await) ==="
python3 - <<'PY'
import json
from pathlib import Path
nb_path = Path("examples/material/upload_materials_from_file_poscar.ipynb")
nb = json.loads(nb_path.read_text(encoding="utf-8"))
def show(cell_idx):
cell = nb["cells"][cell_idx]
print(f"\n--- cell_index={cell_idx} cell_type={cell.get('cell_type')} ---")
src = cell.get("source", [])
if isinstance(src, str):
src = src.splitlines(True)
for i,line in enumerate(src, start=1):
print(f"{i:>3}: {line.rstrip()}")
# Show first few cells with await
for idx in range(min(6, len(nb["cells"]))):
cell = nb["cells"][idx]
if cell.get("cell_type") == "code" and any("await " in (l if isinstance(cell["source"], list) else "") for l in (cell.get("source") or [])):
show(idx)
# Show cells mentioning POSCAR_PATH
for idx, cell in enumerate(nb["cells"]):
if cell.get("cell_type") == "code":
src = cell.get("source", [])
if isinstance(src, str):
src = src.splitlines(True)
if any("POSCAR_PATH" in l for l in src):
show(idx)
if cell.get("cell_type") == "markdown":
src = cell.get("source", [])
if isinstance(src, str):
src = src.splitlines(True)
if any("POSCAR_PATH" in l for l in src):
show(idx)
PYRepository: Exabyte-io/api-examples
Length of output: 12384
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== .github/workflows/cicd.yml ==="
sed -n '1,120p' .github/workflows/cicd.yml
echo
sed -n '120,240p' .github/workflows/cicd.yml || true
echo
echo "=== .pre-commit-config.yaml (full) ==="
cat .pre-commit-config.yaml
echo
echo "=== ruff.toml (full) ==="
cat ruff.toml
echo
echo "=== Search for ruff hook options (ipynb) ==="
rg -n --hidden -S "id: ruff|ruff\.toml|ipynb|F704|PLE1142|--extend|--select|--ignore" .pre-commit-config.yaml pyproject.toml ruff.toml .github/workflows || trueRepository: Exabyte-io/api-examples
Length of output: 7286
🌐 Web query:
Exabyte-io/pre-commit-hooks ruff hook id: ruff supports .ipynb
💡 Result:
The repository Exabyte-io/pre-commit-hooks appears to be a separate, inactive project [1] and is not the official repository for the Ruff pre-commit hook. The official Ruff pre-commit integration is maintained at https://github.com/astral-sh/ruff-pre-commit [2][3]. This official hook does support Jupyter Notebooks (.ipynb) [2][4]. As of August 2024, the official ruff-pre-commit hook checks Jupyter notebooks by default [4]. If you wish to configure the hook, you can control which file types are processed by modifying the types_or list in your.pre-commit-config.yaml file [2][3]. For example, to include Jupyter notebooks, you can use: - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.14 hooks: - id: ruff-check types_or: [python, pyi, jupyter] args: [--fix] - id: ruff-format types_or: [python, pyi, jupyter] If you want to exclude notebooks, simply remove jupyter from that list [2][3].
Citations:
- 1: https://github.com/Exabyte-io/pre-commit-hooks
- 2: https://github.com/astral-sh/ruff-pre-commit/blob/main/README.md
- 3: https://docs.astral.sh/ruff/integrations/
- 4: Check Jupyter notebooks by default, in sync with 0.6.0 astral-sh/ruff-pre-commit#96
🌐 Web query:
Exabyte-io/pre-commit-hooks ruff hook configuration types_or ipynb
💡 Result:
The Ruff pre-commit hook—primarily maintained at astral-sh/ruff-pre-commit [1][2]—uses types_or: [python, pyi, jupyter] by default, meaning it runs on Python files, type stubs, and Jupyter Notebooks [3]. To specifically configure the types_or setting for Jupyter notebooks in your .pre-commit-config.yaml, you can include or exclude jupyter in the list of file types [1][2][4]. For example, to run Ruff only on Python files and type stubs (thereby disabling it for Jupyter notebooks), you would configure the hook as follows [1][2]: - repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.15.14 # Use the latest version hooks: - id: ruff-check types_or: [python, pyi] args: [--fix] - id: ruff-format types_or: [python, pyi] If you wish to keep Jupyter notebooks enabled, you can either omit the types_or line (to use the default) or explicitly include it [1][3]: - id: ruff-check types_or: [python, pyi, jupyter] Note that Exabyte-io/pre-commit-hooks [5] is a distinct repository and does not appear to be the official source for the standard Ruff pre-commit integration. For production setups, it is recommended to use the official astral-sh/ruff-pre-commit repository [1][4].
Citations:
- 1: https://github.com/astral-sh/ruff-pre-commit/blob/main/README.md
- 2: https://github.com/charliermarsh/ruff-pre-commit
- 3: https://deepwiki.com/astral-sh/ruff-pre-commit/3-pre-commit-hooks
- 4: https://github.com/astral-sh/ruff/blob/main/docs/integrations.md
- 5: https://github.com/Exabyte-io/pre-commit-hooks
🌐 Web query:
site:github.com/Exabyte-io/pre-commit-hooks ruff hook id ruff types_or jupyter ipynb
💡 Result:
In Exabyte-io/pre-commit-hooks, the hook with id: ruff is currently restricted to Python files only:
- id: rufftypes: [python](note: nottypes_or, and nojupyter) (github.com)
So it will not run on Jupyter notebooks (.ipynb), since pre-commit matches notebooks via the jupyter file type. To include notebooks, change it to something like:
- id: ruff
types_or: [python, pyi, jupyter]This is the same approach documented for Ruff’s official pre-commit integration (types_or: [..., jupyter]). (github.com)
Ruff itself has built-in support for linting/formatting Jupyter notebooks. (github.com)
Citations:
- 1: https://github.com/Exabyte-io/pre-commit-hooks/blob/main/.pre-commit-hooks.yaml
- 2: https://github.com/astral-sh/ruff-pre-commit?utm_source=openai
- 3: https://github.com/astral-sh/ruff/blob/main/docs/faq.md?utm_source=openai
Fix POSCAR_PATH documentation mismatch and reassess lint-blocking risk for await.
- The notebook claims
POSCAR_PATHis an absolute path, but the code setsPOSCAR_PATH = "../assets/mp-978534.poscar"(relative path). - The notebook uses top-level
await(e.g.,await install_packages(...),await authenticate()). CI runspre-commit(.github/workflows/cicd.yml), but the configuredruffhook inExabyte-io/pre-commit-hooksis limited to Python files (types: [python]), so.ipynbwon’t be linted there; this will still matter if the notebook is linted/exported as Python elsewhere (F704/PLE1142).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/material/upload_materials_from_file_poscar.ipynb` around lines 18 -
27, The notebook text claims POSCAR_PATH is absolute but the code sets
POSCAR_PATH = "../assets/mp-978534.poscar"; change the documentation to state
POSCAR_PATH is a relative path (or update POSCAR_PATH to an absolute path) so
the docstring and variable agree, and ensure any readers know which behavior you
choose; also avoid top-level await in the notebook to prevent lint/export issues
by wrapping calls like await install_packages(...) and await authenticate()
inside an async function (e.g., main) and calling it via asyncio.run or similar,
or document the lint risk (F704/PLE1142) if you intentionally keep top-level
await. Include references to POSCAR_PATH, install_packages, and authenticate
when making these changes.
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "metadata": {}, | ||
| "outputs": [], | ||
| "source": [ | ||
| "# Overview\n", | ||
| "from mat3ra.notebooks_utils.packages import install_packages\n", | ||
| "\n", | ||
| "Inside this example we contact [Workflow](https://docs.mat3ra.com/api/Workflows/get_workflows) endpoint to obtain a list of workflows that an account has access to." | ||
| "await install_packages(\"api\")" | ||
| ] |
There was a problem hiding this comment.
❓ Verification inconclusive
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="examples/workflow/get_workflows.ipynb"
echo "== File: $FILE =="
if [ ! -f "$FILE" ]; then
echo "Missing file!"
exit 1
fi
# Show line numbers around 18-50
nl -ba "$FILE" | sed -n '1,120p' | sed -n '15,60p'
# Extract all code cells' sources that contain "await"
python3 - <<'PY'
import json, re
path="examples/workflow/get_workflows.ipynb"
with open(path,'r',encoding='utf-8') as f:
nb=json.load(f)
hits=[]
for i,cell in enumerate(nb.get("cells",[])):
if cell.get("cell_type")!="code":
continue
src=cell.get("source",[])
text="".join(src) if isinstance(src,list) else str(src)
if re.search(r'(^|\n)\s*await\s+', text) or re.search(r'await\s+install_packages', text):
hits.append((i,text))
print("== Code cells containing 'await' ==")
for idx,text in hits:
print(f"\n--- cell_index={idx} ---")
print(text)
PY
# Find Ruff invocation / config
echo "== Search for Ruff config/CI =="
ls -la
fd -t f -i 'pyproject.toml' . -x printf '%p\n' 2>/dev/null | head -n 20 || true
fd -t f -i '.ruff.toml' . | head -n 20 || true
fd -t f -i 'ruff.*toml' . | head -n 50 || true
fd -t f -i '*ruff*' .gitlab-ci* .github/workflows 2>/dev/null | head -n 50 || true
# Show likely Ruff config files
for f in pyproject.toml ruff.toml .ruff.toml .ruff.toml; do
[ -f "$f" ] && echo "FOUND $f"
done
# If pyproject exists, extract ruff section
if [ -f "pyproject.toml" ]; then
echo "== pyproject.toml ruff section (approx) =="
python3 - <<'PY'
import tomllib, pathlib
p=pathlib.Path("pyproject.toml")
data=tomllib.loads(p.read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff",{})
print("ruff keys:", list(ruff.keys()))
print("select:", ruff.get("lint",{}).get("select") if isinstance(ruff.get("lint"),dict) else None)
print("ignore:", ruff.get("lint",{}).get("ignore") if isinstance(ruff.get("lint"),dict) else None)
print("extend-select:", ruff.get("lint",{}).get("extend-select") if isinstance(ruff.get("lint"),dict) else None)
PY
fi
# Search workflow/CI for ruff
rg -n --hidden --glob '!**/node_modules/**' "ruff.*(check|format)|\\bRuff\\b" .github/workflows 2>/dev/null || true
rg -n --hidden "ruff\\s+check|ruff\\s+format" . 2>/dev/null | head -n 50 || true
# Search for F704/PLE1142 in repo
rg -n "F704|PLE1142" . | head -n 50 || trueRepository: Exabyte-io/api-examples
Length of output: 179
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="examples/workflow/get_workflows.ipynb"
echo "== File: $FILE =="
if [ ! -f "$FILE" ]; then
echo "Missing file!"
exit 1
fi
nl -ba "$FILE" | sed -n '1,200p' | sed -n '15,90p'
python3 - <<'PY'
import json, re
path="examples/workflow/get_workflows.ipynb"
with open(path,'r',encoding='utf-8') as f:
nb=json.load(f)
hits=[]
for i,cell in enumerate(nb.get("cells",[])):
if cell.get("cell_type")!="code":
continue
src=cell.get("source",[])
text="".join(src) if isinstance(src,list) else str(src)
if "await" in text:
hits.append((i,text))
print("== Code cells with any 'await' ==")
for idx,text in hits:
print(f"\n--- cell_index={idx} ---")
for line in text.splitlines():
if line.strip():
print(line)
PY
echo "== Search for Ruff config/CI =="
ls -la > /dev/null
if [ -f "pyproject.toml" ]; then
echo "FOUND pyproject.toml"
python3 - <<'PY'
import tomllib, pathlib
data=tomllib.loads(pathlib.Path("pyproject.toml").read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff",{})
print("ruff keys:", list(ruff.keys()))
lint=ruff.get("lint",{})
print("lint keys:", list(lint.keys()) if isinstance(lint,dict) else None)
print("lint.select:", lint.get("select") if isinstance(lint,dict) else None)
print("lint.extend-select:", lint.get("extend-select") if isinstance(lint,dict) else None)
print("lint.ignore:", lint.get("ignore") if isinstance(lint,dict) else None)
PY
fi
# List possible ruff config files
fd -t f -i 'ruff*.toml' . 2>/dev/null | head -n 20 || true
fd -t f -i '*ruff*' .github/workflows 2>/dev/null | head -n 50 || true
rg -n --hidden "ruff\\s+(check|format)" .github/workflows 2>/dev/null || true
rg -n --hidden "ruff\\s+(check|format)" . 2>/dev/null | head -n 80 || true
rg -n "F704|PLE1142" . | head -n 80 || trueRepository: Exabyte-io/api-examples
Length of output: 179
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="examples/workflow/get_workflows.ipynb"
echo "== File: $FILE =="
if [ ! -f "$FILE" ]; then
echo "Missing file!"
exit 1
fi
echo "== Raw JSON lines 1-120 (subset 15-90) =="
nl -ba "$FILE" | sed -n '15,90p'
echo "== Extract code-cell sources containing 'await' =="
python3 - <<'PY'
import json, re
path="examples/workflow/get_workflows.ipynb"
with open(path,'r',encoding='utf-8') as f:
nb=json.load(f)
hits=[]
for i,cell in enumerate(nb.get("cells",[])):
if cell.get("cell_type")!="code":
continue
src=cell.get("source",[])
text="".join(src) if isinstance(src,list) else str(src)
if re.search(r'\bawait\b', text):
hits.append((i,text))
print("code_cell_count:", sum(1 for c in nb.get("cells",[]) if c.get("cell_type")=="code"))
print("== Code cells with any 'await' ==")
for idx,text in hits:
print(f"\n--- code_cell_index={idx} ---")
print(text)
PY
echo "== Ruff configuration & CI =="
if [ -f "pyproject.toml" ]; then
echo "FOUND pyproject.toml"
python3 - <<'PY'
import tomllib, pathlib, json
data=tomllib.loads(pathlib.Path("pyproject.toml").read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff",{})
lint=ruff.get("lint",{})
print("ruff.lint keys:", list(lint.keys()) if isinstance(lint,dict) else None)
for k in ["select","extend-select","ignore","extend-ignore"]:
if isinstance(lint,dict) and k in lint:
print(f"ruff.lint.{k} =", lint[k])
PY
fi
echo "== Searching for ruff invocations =="
rg -n --hidden "ruff\\s+(check|format)" .github/workflows 2>/dev/null || true
rg -n --hidden "ruff\\s+(check|format)" . 2>/dev/null | head -n 100 || true
echo "== Searching for mentions of F704/PLE1142 =="
rg -n "F704|PLE1142" . 2>/dev/null | head -n 100 || trueRepository: Exabyte-io/api-examples
Length of output: 221
🏁 Script executed:
#!/bin/bash
set -euo pipefail
FILE="examples/workflow/get_workflows.ipynb"
echo "== File: $FILE =="
test -f "$FILE" || { echo "Missing file"; exit 1; }
echo "== Show notebook excerpt (lines 15-90) =="
# use sed directly for JSON excerpt
sed -n '15,90p' "$FILE" | head -n 200
echo "== Extract code-cell sources containing 'await' (and show code-cell index) =="
python3 - <<'PY'
import json, re
path="examples/workflow/get_workflows.ipynb"
with open(path,'r',encoding='utf-8') as f:
nb=json.load(f)
cells=nb.get("cells",[])
print("code_cell_count:", sum(1 for c in cells if c.get("cell_type")=="code"))
hits=[]
for i,cell in enumerate(cells):
if cell.get("cell_type")!="code":
continue
src=cell.get("source",[])
text="".join(src) if isinstance(src,list) else str(src)
if re.search(r'\bawait\b', text):
hits.append((i,text))
print("== Code cells with any 'await' ==")
for idx,text in hits:
print(f"\n--- code_cell_index={idx} ---")
print(text)
PY
echo "== Search repo for Ruff config/CI =="
# list common ruff config files
fd -t f -i 'pyproject.toml' . | head -n 50 || true
fd -t f -i '.ruff.toml' . | head -n 50 || true
fd -t f -i 'ruff*.toml' . | head -n 50 || true
# show ruff invocation in CI workflows
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)" .github/workflows 2>/dev/null || true
rg -n --hidden --glob '!**/node_modules/**' "Ruff" .github/workflows 2>/dev/null || true
echo "== Dump Ruff relevant config (if pyproject.toml exists) =="
if [ -f "pyproject.toml" ]; then
python3 - <<'PY'
import tomllib, pathlib
p=pathlib.Path("pyproject.toml")
data=tomllib.loads(p.read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff",{})
print("ruff_present:", bool(ruff))
lint=ruff.get("lint",{})
print("ruff.lint keys:", list(lint.keys()) if isinstance(lint,dict) else None)
for k in ["select","extend-select","ignore","extend-ignore","fixable","unfixable"]:
if isinstance(lint,dict) and k in lint:
print(f"ruff.lint.{k} =", lint[k])
# notebook-specific config if present
for k in ["extend-exclude","exclude","src","target-version","format"]:
if k in ruff:
pass
PY
fi
echo "== Search for F704 / PLE1142 mentions =="
rg -n "F704|PLE1142" . 2>/dev/null | head -n 80 || trueRepository: Exabyte-io/api-examples
Length of output: 2881
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== pyproject.toml (top-level excerpt) =="
# Print only relevant lines containing 'ruff' or 'lint' or 'ipynb'
rg -n "ruff|ipynb|notebook|jupyter" pyproject.toml ruff.toml .github/workflows 2>/dev/null || true
echo "== ruff.toml =="
if [ -f ruff.toml ]; then
sed -n '1,200p' ruff.toml
fi
echo "== pyproject.toml ruff-related sections =="
if [ -f pyproject.toml ]; then
# Extract sections containing [tool.ruff] and related
# Use python to parse toml keys safely
python3 - <<'PY'
import tomllib, pathlib
p=pathlib.Path("pyproject.toml")
data=tomllib.loads(p.read_text("utf-8"))
tool=data.get("tool",{})
ruff=tool.get("ruff")
print("tool.ruff present:", bool(ruff))
if ruff:
print("tool.ruff keys:", list(ruff.keys()))
# print nested lint/select/extend-select/ignore if exist
lint=ruff.get("lint")
if isinstance(lint,dict):
for k in ["select","extend-select","ignore","extend-ignore","per-file-ignores","extend-per-file-ignores","exclude","fixable"]:
if k in lint:
print("lint."+k+" =", lint[k])
target=ruff.get("target-version")
print("target-version:", target)
PY
# show any [tool.ruff.*] blocks more directly
# best-effort: show lines around tool.ruff occurrences
rg -n "^\[tool\.ruff" pyproject.toml || true
# Print limited ranges around those matches
for line in $(rg -n "^\[tool\.ruff" pyproject.toml | cut -d: -f1); do
start=$((line-5)); end=$((line+60));
echo "--- pyproject.toml around line $line ---"
sed -n "${start},${end}p" pyproject.toml
done
fi
echo "== Search CI/workflows for ruff & ipynb =="
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)|ruff\\.check|ruff\\.format|Ruff\\b" .github/workflows . 2>/dev/null | head -n 200 || true
rg -n --hidden --glob '!**/node_modules/**' "ipynb" .github/workflows . 2>/dev/null | head -n 200 || true
echo "== Search for any local scripts invoking ruff on notebooks =="
rg -n --hidden --glob '!**/node_modules/**' "ipynb.*ruff|ruff.*ipynb|nbconvert|jupyter|jupytext" . 2>/dev/null | head -n 200 || trueRepository: Exabyte-io/api-examples
Length of output: 50379
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== pyproject.toml / ruff.toml: locate Ruff+ipynb related text =="
rg -n "ruff|ipynb|notebook|jupyter" pyproject.toml ruff.toml .github/workflows 2>/dev/null || true
echo "== ruff.toml (first 200 lines) =="
if [ -f ruff.toml ]; then
sed -n '1,220p' ruff.toml
else
echo "No ruff.toml"
fi
echo "== pyproject.toml: parse tool.ruff =="
if [ -f pyproject.toml ]; then
python3 - <<'PY'
import tomllib, pathlib, json
data=tomllib.loads(pathlib.Path("pyproject.toml").read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff")
print("tool.ruff present:", bool(ruff))
if not ruff:
raise SystemExit
print("tool.ruff keys:", list(ruff.keys()))
lint=ruff.get("lint")
if isinstance(lint,dict):
for k in ["select","extend-select","ignore","extend-ignore","per-file-ignores","extend-per-file-ignores","exclude","fixable","unfixable"]:
if k in lint:
v=lint[k]
if isinstance(v,(dict,list,str,int,float,bool)):
print("lint."+k+" =", v if not isinstance(v,dict) else list(v.keys()))
else:
print("lint."+k+" type:", type(v))
if "per-file-ignores" in lint:
print("lint.per-file-ignores =", lint["per-file-ignores"])
if "extend-exclude" in ruff:
print("tool.ruff.extend-exclude =", ruff["extend-exclude"])
if "exclude" in ruff:
print("tool.ruff.exclude =", ruff["exclude"])
PY
echo "== pyproject.toml: show [tool.ruff] blocks with limited context =="
rg -n "^\[tool\.ruff" pyproject.toml || true
for line in $(rg -n "^\[tool\.ruff" pyproject.toml | cut -d: -f1); do
start=$((line-5)); end=$((line+80));
echo "--- around line $line ---"
sed -n "${start},${end}p" pyproject.toml
done
fi
echo "== CI/workflows: find ruff invocations =="
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)|Ruff\\b" .github/workflows 2>/dev/null || true
echo "== Repo-wide: find ipynb mentions =="
rg -n --hidden --glob '!**/node_modules/**' "ipynb" . 2>/dev/null | head -n 200 || true
echo "== Repo-wide: find F704 / PLE1142 =="
rg -n "F704|PLE1142" . 2>/dev/null | head -n 200 || trueRepository: Exabyte-io/api-examples
Length of output: 48265
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Ruff config files exist? =="
ls -la pyproject.toml ruff.toml 2>/dev/null || true
echo "== Extract Ruff-related config from ruff.toml (if present) =="
if [ -f ruff.toml ]; then
sed -n '1,260p' ruff.toml
fi
echo "== Extract Ruff-related sections from pyproject.toml (if present) =="
if [ -f pyproject.toml ]; then
# show relevant lines around tool.ruff occurrences
rg -n "^\[tool\.ruff" pyproject.toml || true
rg -n "tool\.ruff|ipynb|notebook" pyproject.toml || true
fi
echo "== Parse tool.ruff from pyproject.toml (if present) =="
if [ -f pyproject.toml ]; then
python3 - <<'PY'
import tomllib, pathlib
p=pathlib.Path("pyproject.toml")
data=tomllib.loads(p.read_text("utf-8"))
ruff=data.get("tool",{}).get("ruff")
print("tool.ruff:", bool(ruff))
if ruff:
print("keys:", list(ruff.keys()))
print("lint:", ruff.get("lint",{}))
# print full lint section only if small enough
PY
fi
echo "== Search CI/workflows for ruff commands =="
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)|Ruff\\b" .github/workflows 2>/dev/null || true
echo "== Search repo for ipynb linting patterns =="
rg -n --hidden --glob '!**/node_modules/**' "ipynb.*ruff|ruff.*ipynb|jupytext|nbqa" . 2>/dev/null || trueRepository: Exabyte-io/api-examples
Length of output: 2309
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Files =="
ls -la pyproject.toml ruff.toml .github/workflows 2>/dev/null || true
echo "== Show ruff.toml =="
if [ -f ruff.toml ]; then
sed -n '1,260p' ruff.toml
fi
echo "== Show relevant parts of pyproject.toml for tool.ruff =="
if [ -f pyproject.toml ]; then
rg -n "^\[tool\.ruff" pyproject.toml || true
# Print ruff blocks with limited context
for line in $(rg -n "^\[tool\.ruff" pyproject.toml | cut -d: -f1); do
start=$((line-5)); end=$((line+120));
echo "--- around line $line ---"
sed -n "${start},${end}p" pyproject.toml
done
fi
echo "== Search workflows for ruff =="
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)|Ruff\\b" .github/workflows 2>/dev/null || true
echo "== Search config for ipynb handling (include/exclude/extend-include/etc) =="
if [ -f ruff.toml ]; then
rg -n "ipynb|notebook|jupyter" ruff.toml || true
fi
if [ -f pyproject.toml ]; then
rg -n "ipynb|notebook|jupyter|extend-include|extend-exclude|include|exclude" pyproject.toml || true
fi
echo "== Search repo for F704 / PLE1142 again (wider) =="
rg -n "F704|PLE1142" . 2>/dev/null || trueRepository: Exabyte-io/api-examples
Length of output: 3304
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== Files =="
ls -la pyproject.toml ruff.toml .github/workflows 2>/dev/null || true
echo "== Show ruff.toml =="
if [ -f ruff.toml ]; then
sed -n '1,260p' ruff.toml
fi
echo "== Show relevant parts of pyproject.toml for tool.ruff =="
if [ -f pyproject.toml ]; then
rg -n "^\[tool\.ruff" pyproject.toml || true
# Print ruff blocks with limited context
for line in $(rg -n "^\[tool\.ruff" pyproject.toml | cut -d: -f1); do
start=$((line-5)); end=$((line+120));
echo "--- around line $line ---"
sed -n "${start},${end}p" pyproject.toml
done
fi
echo "== Search workflows for ruff =="
rg -n --hidden --glob '!**/node_modules/**' "ruff\\s+(check|format)|Ruff\\b" .github/workflows 2>/dev/null || true
echo "== Search config for ipynb handling (include/exclude/extend-include/etc) =="
if [ -f ruff.toml ]; then
rg -n "ipynb|notebook|jupyter" ruff.toml || true
fi
if [ -f pyproject.toml ]; then
rg -n "ipynb|notebook|jupyter|extend-include|extend-exclude|include|exclude" pyproject.toml || true
fi
echo "== Search repo for F704 / PLE1142 again (wider) =="
rg -n "F704|PLE1142" . 2>/dev/null || trueRepository: Exabyte-io/api-examples
Length of output: 3304
Avoid top-level await in examples/workflow/get_workflows.ipynb unless the lint/export pipeline understands notebook async
Line 26 and Line 37 contain await in notebook code cells (await install_packages("api"), await authenticate()). This is valid in Jupyter, but becomes invalid syntax if the cells are linted as plain Python scripts. The exact CI impact depends on whether notebooks are included in the Ruff lint path (directly or via rendered/exported .py output).
Also applies to: 30-38
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/workflow/get_workflows.ipynb` around lines 18 - 27, Top-level await
calls (install_packages and authenticate) must be converted so the notebook can
be linted as plain Python: replace lines using "await install_packages('api')"
and "await authenticate()" with a synchronous entry (import asyncio) and either
wrap them in an async def main() and call asyncio.run(main()) or call
asyncio.run(install_packages("api")) / asyncio.run(authenticate()); ensure you
add "import asyncio" and keep the original function names (install_packages,
authenticate) unchanged.
Summary by CodeRabbit
Release Notes
Documentation
New Features
Refactor