Skip to content

Commit bb1dd58

Browse files
committed
feat(ci): Enhance workflow stability and dependency management
This commit introduces several improvements to the CI/CD pipelines and dependency installation process to enhance stability and resolve compatibility issues. Key changes include: Dependency Fixes: Pins setuptools<70.0.0 in pyproject.toml and relevant scripts to fix a pkg_resources import error with hyperopt in Python 3.12. Updates installation scripts to ensure setuptools and wheel are consistently upgraded with pip. CI Workflow Improvements: The notebook-test.yml workflow is updated to correctly install pip for Python 3.12 and explicitly install the required setuptools version before running tests. Both docs.yml and notebook-test.yml now consistently use a CPU-only version of PyTorch to ensure consistency and save resources. H2O Stability: The H2OBaseClassifier is made more robust with dynamic memory allocation for the H2O cluster based on available system memory. Adds error handling to automatically restart the H2O cluster if it fails during data frame creation due to memory issues. Project Files: Adds a LICENSE file (MIT). Updates the README.md.
1 parent 0e04c28 commit bb1dd58

10 files changed

Lines changed: 126 additions & 276 deletions

File tree

.github/workflows/docs.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,10 @@ jobs:
6565
cache: 'pip'
6666

6767
- name: Install dependencies
68-
run: pip install .[docs]
68+
run: |
69+
# Ensure setuptools and wheel are present before installing the project
70+
pip install --upgrade pip setuptools wheel
71+
pip install .[docs]
6972
7073
- name: Verify PyTorch installation
7174
run: |

.github/workflows/notebook-test.yml

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,9 @@ jobs:
1212
NODE_EXTRA_CA_CERTS: /etc/ssl/certs/ca-certificates.crt
1313
SSL_CERT_FILE: /etc/ssl/certs/ca-certificates.crt
1414
REQUESTS_CA_BUNDLE: /etc/ssl/certs/ca-certificates.crt
15+
# Always use CPU-only PyTorch to ensure consistency and save resources
16+
PIP_EXTRA_INDEX_URL: https://download.pytorch.org/whl/cpu
17+
TORCH_CPU_ONLY: "true"
1518

1619
steps:
1720
- name: Check out repository
@@ -20,14 +23,7 @@ jobs:
2023
- name: Set environment variables
2124
run: |
2225
echo "DEBIAN_FRONTEND=noninteractive" >> $GITHUB_ENV
23-
24-
# Detect if running on GitHub Actions (not Gitea or act)
25-
if [[ "${{ github.server_url }}" == "https://github.com" ]] && [[ -z "${{ env.ACT }}" ]]; then
26-
echo "Running on GitHub - using CPU-only PyTorch"
27-
echo "PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu" >> $GITHUB_ENV
28-
echo "TORCH_CPU_ONLY=true" >> $GITHUB_ENV
29-
fi
30-
26+
3127
if [[ "${{ vars.IS_GITEA }}" == "true" ]]; then
3228
echo "Setting thread limits for Gitea runner"
3329
echo "OMP_NUM_THREADS=1" >> $GITHUB_ENV
@@ -95,6 +91,9 @@ jobs:
9591
sudo apt-get update
9692
sudo apt-get install -y python3.12 python3.12-venv python3.12-dev
9793
94+
# Install pip for Python 3.12
95+
curl -sS https://bootstrap.pypa.io/get-pip.py | sudo python3.12 - --break-system-packages
96+
9897
sudo ln -sf /usr/bin/python3.12 /usr/local/bin/python
9998
sudo ln -sf /usr/bin/python3.12 /usr/local/bin/python3
10099
@@ -104,6 +103,7 @@ jobs:
104103
run: |
105104
cd $GITHUB_WORKSPACE
106105
chmod +x install.sh
106+
sudo python -m pip install --upgrade pip setuptools wheel --break-system-packages
107107
# The install.sh will use PIP_EXTRA_INDEX_URL if set for CPU-only PyTorch
108108
./install.sh
109109
@@ -123,7 +123,9 @@ jobs:
123123
set -e
124124
cd $GITHUB_WORKSPACE
125125
source "$VENV_PATH/bin/activate"
126-
pytest --nbmake --nbmake-timeout=4500 notebooks/unit_test_synthetic.ipynb
126+
echo "Ensuring setuptools is installed to provide pkg_resources for hyperopt..."
127+
python -m pip install 'setuptools<70.0.0'
128+
pytest --nbmake --nbmake-timeout=4500 --nbmake-kernel=ml_grid_env notebooks/unit_test_synthetic.ipynb
127129
128130
echo "Running Python unit tests..."
129131
pytest tests/

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2019
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 24 additions & 256 deletions
Original file line numberDiff line numberDiff line change
@@ -1,273 +1,41 @@
1-
# ml_binary_classification_gridsearch_hyperOpt
1+
![act-logo](https://raw.githubusercontent.com/wiki/nektos/act/img/logo-150.png)
22

3-
[![Documentation Status](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/docs.yml/badge.svg)](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/)
3+
# Overview [![push](https://github.com/nektos/act/workflows/push/badge.svg?branch=master&event=push)](https://github.com/nektos/act/actions) [![Join the chat at https://gitter.im/nektos/act](https://badges.gitter.im/nektos/act.svg)](https://gitter.im/nektos/act?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Go Report Card](https://goreportcard.com/badge/github.com/nektos/act)](https://goreportcard.com/report/github.com/nektos/act) [![awesome-runners](https://img.shields.io/badge/listed%20on-awesome--runners-blue.svg)](https://github.com/jonico/awesome-runners)
44

5-
[![CI/CD](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/notebook-test.yml/badge.svg)](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/notebook-test.yml)
6-
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
5+
> "Think globally, `act` locally"
76
8-
This repository contains Python code for binary classification using grid search and hyperparameter optimization techniques.
7+
Run your [GitHub Actions](https://developer.github.com/actions/) locally! Why would you want to do this? Two reasons:
98

10-
# Table of Contents
9+
- **Fast Feedback** - Rather than having to commit/push every time you want to test out the changes you are making to your `.github/workflows/` files (or for any changes to embedded GitHub actions), you can use `act` to run the actions locally. The [environment variables](https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables#default-environment-variables) and [filesystem](https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#filesystems-on-github-hosted-runners) are all configured to match what GitHub provides.
10+
- **Local Task Runner** - I love [make](<https://en.wikipedia.org/wiki/Make_(software)>). However, I also hate repeating myself. With `act`, you can use the GitHub Actions defined in your `.github/workflows/` to replace your `Makefile`!
1111

12-
- [ml_binary_classification_gridsearch_hyperOpt](#ml_binary_classification_gridsearch_hyperopt)
13-
- [Overview](#overview)
14-
- [Diagrams](#diagrams)
15-
- [Features](#features)
16-
- [Getting Started](#getting-started)
17-
- [Prerequisites](#prerequisites)
18-
- [Installation](#installation)
19-
- [Windows](#windows)
20-
- [Unix/Linux](#unixlinux)
21-
- [Usage](#usage)
22-
- [Examples](#examples)
23-
- [Project Structure](#project-structure)
24-
- [Contributing](#contributing)
25-
- [License](#license)
26-
- [Appendix](#appendix)
27-
- [Acknowledgments](#acknowledgments)
12+
> [!TIP]
13+
> **Now Manage and Run Act Directly From VS Code!**<br/>
14+
> Check out the [GitHub Local Actions](https://sanjulaganepola.github.io/github-local-actions-docs/) Visual Studio Code extension which allows you to leverage the power of `act` to run and test workflows locally without leaving your editor.
2815
16+
# How Does It Work?
2917

30-
## Overview
18+
When you run `act` it reads in your GitHub Actions from `.github/workflows/` and determines the set of actions that need to be run. It uses the Docker API to either pull or build the necessary images, as defined in your workflow files and finally determines the execution path based on the dependencies that were defined. Once it has the execution path, it then uses the Docker API to run containers for each action based on the images prepared earlier. The [environment variables](https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables#default-environment-variables) and [filesystem](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#file-systems) are all configured to match what GitHub provides.
3119

32-
Binary classification is a common machine learning task where the goal is to categorize data into one of two classes. This repository provides a framework for performing binary classification using various machine learning algorithms and optimizing their hyperparameters through grid search and hyperparameter optimization techniques.
20+
Let's see it in action with a [sample repo](https://github.com/cplee/github-actions-demo)!
3321

34-
## Features
22+
![Demo](https://raw.githubusercontent.com/wiki/nektos/act/quickstart/act-quickstart-2.gif)
3523

36-
This framework is designed to be a comprehensive toolkit for binary classification experiments, offering a wide range of configurable options:
24+
# Act User Guide
3725

38-
- **Diverse Model Support:** Includes a collection of standard classifiers (e.g., Logistic Regression, SVM, RandomForest, XGBoost, LightGBM, CatBoost, H2O AutoML/GLM/GBM) and specialized time-series models from the `aeon` library (e.g., HIVE-COTE v2, MUSE, OrdinalTDE).
39-
- **Advanced Hyperparameter Tuning:** Supports multiple search strategies:
40-
- **Grid Search:** Exhaustively search a defined parameter grid.
41-
- **Random Search:** Randomly sample from the parameter space.
42-
- **Bayesian Optimization:** Intelligently search the parameter space using `scikit-optimize`.
43-
- **Configurable Data Pipeline:** A highly modular pipeline allows for fine-grained control over data processing steps:
44-
- **Feature Selection:** Toggle groups of features (e.g., demographics, blood tests, annotations).
45-
- **Data Cleaning:** Handle missing values, constant columns, and correlated features.
46-
- **Resampling:** Address class imbalance with oversampling (RandomOverSampler) or undersampling (RandomUnderSampler).
47-
- **Scaling:** Apply standard scaling to numeric features.
48-
- **Automated Results Analysis:** Includes tools to automatically aggregate results from multiple runs and generate insightful plots, such as global parameter importance.
49-
- **Time-Series Capabilities:** Specialized pipeline mode for handling time-series data, including conversion to the required 3D format for `aeon` classifiers.
26+
Please look at the [act user guide](https://nektosact.com) for more documentation.
5027

51-
## Diagrams
28+
# Support
5229

53-
Below are visual diagrams representing various components of the project. All `.mmd` source files are Mermaid diagrams, and the rendered versions are available in `.svg` or `.png` formats.
30+
Need help? Ask on [Gitter](https://gitter.im/nektos/act)!
5431

55-
### Feature Importance
56-
- [Mermaid source](assets/data_feature_importance_methods.mmd)
57-
<img src="assets/data_feature_importance_methods.svg" width="400" height="300"/>
32+
# Contributing
5833

59-
### Data Pipeline
60-
- [Mermaid source](assets/data_pipeline.mmd)
61-
<img src="assets/data_pipeline.svg" width="400" height="300"/>
34+
Want to contribute to act? Awesome! Check out the [contributing guidelines](CONTRIBUTING.md) to get involved.
6235

63-
### Grid Parameter Search Space
64-
- [Mermaid source](assets/grid_param_space.mmd)
65-
<img src="assets/grid_param_space.svg" width="400" height="300"/>
36+
## Manually building from source
6637

67-
### Hyperparameter Search
68-
- [Mermaid source](assets/hyperparameter_search.mmd)
69-
<img src="assets/hyperparameter_search.svg" width="400" height="300"/>
70-
71-
### Imputation Pipeline
72-
- [Mermaid source](assets/impute_data_for_pipe.mmd)
73-
<img src="assets/impute_data_for_pipe.svg" width="400" height="300"/>
74-
75-
### ML Repository Architecture
76-
- [Mermaid source](assets/ml_repository_architecture.mmd)
77-
<img src="assets/ml_repository_architecture.png" width="400" height="300"/>
78-
79-
### Model Class Listing (Time Series)
80-
- [Mermaid source](assets/model_class_list_model_class_list_ts.mmd)
81-
<img src="assets/model_class_list_model_class_list_ts.svg" width="400" height="300"/>
82-
83-
### Project Scoring and Model Saving
84-
- [Mermaid source](assets/project_score_save.mmd)
85-
<img src="assets/project_score_save.svg" width="400" height="300"/>
86-
87-
### Time Series Helper Functions
88-
- [Mermaid source](assets/time_series_helper.mmd)
89-
<img src="assets/time_series_helper.svg" width="400" height="300"/>
90-
91-
### Unit Test - Synthetic Data
92-
- [Mermaid source](assets/unit_test_synthetic.mmd)
93-
<img src="assets/unit_test_synthetic.svg" width="400" height="300"/>
94-
95-
### Results Processing Pipeline
96-
- [Mermaid source](assets/results_processing_pipeline.mmd)
97-
<img src="assets/results_processing_pipeline.svg" width="600" height="450"/>
98-
99-
100-
## Getting Started
101-
102-
### Prerequisites
103-
104-
Designed for usage with a numeric data matrix for binary classification. Single or multiple outcome variables (One v rest). An example is provided. Time series is also implemented.
105-
106-
## Installation
107-
108-
This project includes convenient installation scripts for Unix/Linux/macOS and Windows. These scripts will create a virtual environment, install all necessary dependencies, and register a Jupyter kernel for you.
109-
110-
### Quick Install using Scripts
111-
112-
1. **Clone the repository:**
113-
```shell
114-
git clone https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt.git
115-
cd ml_binary_classification_gridsearch_hyperOpt
116-
```
117-
118-
2. **Run the installation script:**
119-
120-
* **For a standard installation:**
121-
* On Unix/Linux/macOS:
122-
```bash
123-
chmod +x install.sh
124-
./install.sh
125-
```
126-
* On Windows:
127-
```bat
128-
install.bat
129-
```
130-
This will create a virtual environment named `ml_grid_env`.
131-
132-
* **For a time-series installation (includes all standard dependencies):**
133-
* On Unix/Linux/macOS:
134-
```bash
135-
chmod +x install.sh
136-
./install.sh ts
137-
```
138-
* On Windows:
139-
```bat
140-
install.bat ts
141-
```
142-
This will create a virtual environment named `ml_grid_ts_env`.
143-
144-
## Usage
145-
146-
After installation, activate the virtual environment to run your code or notebooks.
147-
148-
* **To activate the standard environment:**
149-
* On Unix/Linux/macOS: `source ml_grid_env/bin/activate`
150-
* On Windows: `.\ml_grid_env\Scripts\activate`
151-
152-
* **To activate the time-series environment:**
153-
* On Unix/Linux/macOS: `source ml_grid_ts_env/bin/activate`
154-
* On Windows: `.\ml_grid_ts_env\Scripts\activate`
155-
156-
### Running the Notebooks
157-
158-
The `notebooks/` directory contains examples for different use cases:
159-
160-
- **`unit_test_synthetic.ipynb`**: The main entry point for running experiments. It demonstrates how to generate synthetic data, test the data pipeline, and run a full Hyperopt search. Start here to understand the end-to-end workflow.
161-
- **`01_hyperopt_grid.ipynb`**: A focused example of running a full hyperparameter search using `Hyperopt` based on the `config_hyperopt.yml` file.
162-
- **`02_single_run.ipynb`**: A script for executing a single run with a specific set of parameters defined in `config_single_run.yml`. Useful for debugging or testing one configuration.
163-
164-
### Basic Example
165-
166-
The main entry point for running experiments is a script or notebook that loads the configuration and iterates through the parameter space defined in `config.yml`.
167-
168-
1. **Configure your experiment in `config.yml`:**
169-
- Set the data path, models, and parameter space.
170-
171-
2. **Run the experiment:**
172-
- The following script demonstrates how to execute a full grid search based on your `config.yml`.
173-
174-
```python
175-
from pathlib import Path
176-
from ml_grid.pipeline.data import pipe
177-
from ml_grid.util.param_space import parameter_space
178-
from ml_grid.util.create_experiment_directory import create_experiment_directory
179-
from ml_grid.util.config_parser import load_config
180-
181-
# Load configuration from config.yml
182-
config = load_config()
183-
184-
# Set project root
185-
project_root = Path().resolve().parent
186-
187-
# Create a unique directory for this experiment run
188-
experiments_base_dir = project_root / config['experiment']['experiments_base_dir']
189-
experiment_dir = create_experiment_directory(
190-
base_dir=experiments_base_dir,
191-
additional_naming=config['experiment']['additional_naming']
192-
)
193-
194-
# Generate the parameter space from the config file
195-
param_space_df = parameter_space(config['param_space']).get_parameter_space()
196-
197-
# Iterate through each parameter combination and run the pipeline
198-
for i, row in param_space_df.iterrows():
199-
local_param_dict = row.to_dict()
200-
print(f"Running experiment {i+1}/{len(param_space_df)} with params: {local_param_dict}")
201-
pipe(
202-
config=config,
203-
local_param_dict=local_param_dict,
204-
base_project_dir=project_root,
205-
experiment_dir=experiment_dir,
206-
param_space_index=i
207-
)
208-
```
209-
If you are using Jupyter, you can also select the kernel created during installation (e.g., `Python (ml_grid_env)`) directly from the Jupyter interface.
210-
211-
## Examples
212-
213-
See [ml_grid/tests/unit_test_synthetic.ipynb]
214-
215-
## Documentation
216-
217-
The latest documentation is hosted online and can be viewed [here](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/).
218-
219-
This project uses Sphinx for documentation. The documentation includes usage guides and an auto-generated API reference.
220-
221-
To build the documentation locally:
222-
223-
1. Install the documentation dependencies (make sure your virtual environment is activated):
224-
```bash
225-
pip install -e .[docs]
226-
```
227-
228-
2. Build the HTML documentation:
229-
```bash
230-
sphinx-build -b html docs/source docs/build
231-
```
232-
233-
3. Open `docs/build/index.html` in your web browser to view the documentation.
234-
235-
## Project Structure
236-
237-
The repository is organized to separate concerns, making it easier to navigate and extend.
238-
239-
```
240-
.
241-
├── assets/ # Mermaid diagrams and other assets
242-
├── docs/ # Sphinx documentation source and build files
243-
├── ml_grid/ # Main source code for the library
244-
│ ├── model_classes/ # Standard classifier wrappers
245-
│ ├── model_classes_time_series/ # Time-series classifier wrappers
246-
│ ├── pipeline/ # Core data processing and pipeline logic
247-
│ ├── results_processing/ # Tools for aggregating and plotting results
248-
│ └── util/ # Utility functions and global parameters
249-
├── tests/ # Unit and integration tests
250-
├── install.sh # Installation script for Unix/Linux
251-
└── install.bat # Installation script for Windows
252-
```
253-
254-
## Contributing
255-
If you would like to contribute to this project, please follow these steps:
256-
257-
Fork the repository on GitHub.
258-
Create a new branch for your feature or bug fix.
259-
Make your changes and commit them with descriptive commit messages.
260-
Push your changes to your fork.
261-
Create a pull request to the main repository's master branch.
262-
263-
## License
264-
This project is licensed under the MIT License - see the LICENSE file for details.
265-
266-
267-
## Appendix
268-
269-
270-
## Acknowledgments
271-
scikit-learn
272-
hyperopt
273-
H2O.ai
38+
- Install Go tools 1.20+ - (<https://golang.org/doc/install>)
39+
- Clone this repo `git clone git@github.com:nektos/act.git`
40+
- Run unit tests with `make test`
41+
- Build and install: `make install`

install.bat

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ echo Virtual environment activated successfully.
5050

5151
rem Upgrade pip
5252
echo Upgrading pip...
53-
pip install --upgrade pip
53+
pip install --upgrade pip setuptools wheel
5454

5555
rem Install the project in editable mode along with testing dependencies.
5656
rem This reads all dependencies from pyproject.toml.

0 commit comments

Comments
 (0)