Skip to content

Move jsonargparse to main dependencies, lazy-load sklearn/scipy in PubChem#153

Merged
sfluegel05 merged 3 commits intodevfrom
copilot/add-jsonargparse-dependency
May 5, 2026
Merged

Move jsonargparse to main dependencies, lazy-load sklearn/scipy in PubChem#153
sfluegel05 merged 3 commits intodevfrom
copilot/add-jsonargparse-dependency

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 17, 2026

Inference-only installations failed due to missing jsonargparse (required by Lightning CLI) and scikit-learn imports in PubChem dataset module being eagerly evaluated at import time.

Changes

Dependencies (pyproject.toml)

  • Move jsonargparse[signatures]>=4.17 from [dev] to main dependencies

Lazy imports (chebai/preprocessing/datasets/pubchem.py)

  • Move sklearn.cluster.KMeans import into PubChemKMeans._build_clusters() and cluster_centers_superclustered property
  • Move sklearn.model_selection.train_test_split into PubChem.setup_processed() and PubChemTokens.setup_processed()
  • Move scipy.spatial import into PubChemKMeans._exclude_clusters()

This allows importing PubChem classes without requiring sklearn/scipy to be installed, deferring the dependency until dataset methods that actually use these libraries are called.

# Now works without sklearn installed
from chebai.preprocessing.datasets.pubchem import PubChem

# sklearn only required when calling setup_processed() or other split/clustering methods
Original prompt

This section details on the original issue you should resolve

<issue_title>Add jsonargparse to main dependencies, only use sklearn if needed</issue_title>
<issue_description>We had two issues when installing chebai for inference only:

  • missing jsonargparse caused some error in lightning
  • missing scikit-learn causes issues with the pubchem dataset

Both dependencies are included in the [dev] extra.

Todo

  • move jsonargparse to the regular dependencies
  • only load sklearn when a PubChem dataset is actually instantiated</issue_description>

Comments on the Issue (you are @copilot in this section)

@aditya0by0 Lets try and assign this to co-pilot to check whether it can take upon minor issues.

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

… sklearn/scipy

Co-authored-by: aditya0by0 <65857172+aditya0by0@users.noreply.github.com>
Copilot AI changed the title [WIP] Add jsonargparse to main dependencies and optimize sklearn usage Move jsonargparse to main dependencies, lazy-load sklearn/scipy in PubChem Feb 17, 2026
Copilot AI requested a review from aditya0by0 February 17, 2026 20:53
@sfluegel05 sfluegel05 marked this pull request as ready for review May 5, 2026 13:07
@sfluegel05
Copy link
Copy Markdown
Collaborator

Looks good to me

@sfluegel05 sfluegel05 merged commit 0801341 into dev May 5, 2026
6 checks passed
@sfluegel05 sfluegel05 deleted the copilot/add-jsonargparse-dependency branch May 5, 2026 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add jsonargparse to main dependencies, only use sklearn if needed

3 participants