|
1 | | -# ml_binary_classification_gridsearch_hyperOpt |
| 1 | + |
2 | 2 |
|
3 | | -[](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/) |
| 3 | +# Overview [](https://github.com/nektos/act/actions) [](https://gitter.im/nektos/act?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [](https://goreportcard.com/report/github.com/nektos/act) [](https://github.com/jonico/awesome-runners) |
4 | 4 |
|
5 | | -[](https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt/actions/workflows/notebook-test.yml) |
6 | | -[](https://opensource.org/licenses/MIT) |
| 5 | +> "Think globally, `act` locally" |
7 | 6 |
|
8 | | -This repository contains Python code for binary classification using grid search and hyperparameter optimization techniques. |
| 7 | +Run your [GitHub Actions](https://developer.github.com/actions/) locally! Why would you want to do this? Two reasons: |
9 | 8 |
|
10 | | -# Table of Contents |
| 9 | +- **Fast Feedback** - Rather than having to commit/push every time you want to test out the changes you are making to your `.github/workflows/` files (or for any changes to embedded GitHub actions), you can use `act` to run the actions locally. The [environment variables](https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables#default-environment-variables) and [filesystem](https://help.github.com/en/actions/reference/virtual-environments-for-github-hosted-runners#filesystems-on-github-hosted-runners) are all configured to match what GitHub provides. |
| 10 | +- **Local Task Runner** - I love [make](<https://en.wikipedia.org/wiki/Make_(software)>). However, I also hate repeating myself. With `act`, you can use the GitHub Actions defined in your `.github/workflows/` to replace your `Makefile`! |
11 | 11 |
|
12 | | -- [ml_binary_classification_gridsearch_hyperOpt](#ml_binary_classification_gridsearch_hyperopt) |
13 | | -- [Overview](#overview) |
14 | | -- [Diagrams](#diagrams) |
15 | | -- [Features](#features) |
16 | | -- [Getting Started](#getting-started) |
17 | | - - [Prerequisites](#prerequisites) |
18 | | -- [Installation](#installation) |
19 | | - - [Windows](#windows) |
20 | | - - [Unix/Linux](#unixlinux) |
21 | | -- [Usage](#usage) |
22 | | -- [Examples](#examples) |
23 | | -- [Project Structure](#project-structure) |
24 | | -- [Contributing](#contributing) |
25 | | -- [License](#license) |
26 | | -- [Appendix](#appendix) |
27 | | -- [Acknowledgments](#acknowledgments) |
| 12 | +> [!TIP] |
| 13 | +> **Now Manage and Run Act Directly From VS Code!**<br/> |
| 14 | +> Check out the [GitHub Local Actions](https://sanjulaganepola.github.io/github-local-actions-docs/) Visual Studio Code extension which allows you to leverage the power of `act` to run and test workflows locally without leaving your editor. |
28 | 15 |
|
| 16 | +# How Does It Work? |
29 | 17 |
|
30 | | -## Overview |
| 18 | +When you run `act` it reads in your GitHub Actions from `.github/workflows/` and determines the set of actions that need to be run. It uses the Docker API to either pull or build the necessary images, as defined in your workflow files and finally determines the execution path based on the dependencies that were defined. Once it has the execution path, it then uses the Docker API to run containers for each action based on the images prepared earlier. The [environment variables](https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables#default-environment-variables) and [filesystem](https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#file-systems) are all configured to match what GitHub provides. |
31 | 19 |
|
32 | | -Binary classification is a common machine learning task where the goal is to categorize data into one of two classes. This repository provides a framework for performing binary classification using various machine learning algorithms and optimizing their hyperparameters through grid search and hyperparameter optimization techniques. |
| 20 | +Let's see it in action with a [sample repo](https://github.com/cplee/github-actions-demo)! |
33 | 21 |
|
34 | | -## Features |
| 22 | + |
35 | 23 |
|
36 | | -This framework is designed to be a comprehensive toolkit for binary classification experiments, offering a wide range of configurable options: |
| 24 | +# Act User Guide |
37 | 25 |
|
38 | | -- **Diverse Model Support:** Includes a collection of standard classifiers (e.g., Logistic Regression, SVM, RandomForest, XGBoost, LightGBM, CatBoost, H2O AutoML/GLM/GBM) and specialized time-series models from the `aeon` library (e.g., HIVE-COTE v2, MUSE, OrdinalTDE). |
39 | | -- **Advanced Hyperparameter Tuning:** Supports multiple search strategies: |
40 | | - - **Grid Search:** Exhaustively search a defined parameter grid. |
41 | | - - **Random Search:** Randomly sample from the parameter space. |
42 | | - - **Bayesian Optimization:** Intelligently search the parameter space using `scikit-optimize`. |
43 | | -- **Configurable Data Pipeline:** A highly modular pipeline allows for fine-grained control over data processing steps: |
44 | | - - **Feature Selection:** Toggle groups of features (e.g., demographics, blood tests, annotations). |
45 | | - - **Data Cleaning:** Handle missing values, constant columns, and correlated features. |
46 | | - - **Resampling:** Address class imbalance with oversampling (RandomOverSampler) or undersampling (RandomUnderSampler). |
47 | | - - **Scaling:** Apply standard scaling to numeric features. |
48 | | -- **Automated Results Analysis:** Includes tools to automatically aggregate results from multiple runs and generate insightful plots, such as global parameter importance. |
49 | | -- **Time-Series Capabilities:** Specialized pipeline mode for handling time-series data, including conversion to the required 3D format for `aeon` classifiers. |
| 26 | +Please look at the [act user guide](https://nektosact.com) for more documentation. |
50 | 27 |
|
51 | | -## Diagrams |
| 28 | +# Support |
52 | 29 |
|
53 | | -Below are visual diagrams representing various components of the project. All `.mmd` source files are Mermaid diagrams, and the rendered versions are available in `.svg` or `.png` formats. |
| 30 | +Need help? Ask on [Gitter](https://gitter.im/nektos/act)! |
54 | 31 |
|
55 | | -### Feature Importance |
56 | | -- [Mermaid source](assets/data_feature_importance_methods.mmd) |
57 | | - <img src="assets/data_feature_importance_methods.svg" width="400" height="300"/> |
| 32 | +# Contributing |
58 | 33 |
|
59 | | -### Data Pipeline |
60 | | -- [Mermaid source](assets/data_pipeline.mmd) |
61 | | - <img src="assets/data_pipeline.svg" width="400" height="300"/> |
| 34 | +Want to contribute to act? Awesome! Check out the [contributing guidelines](CONTRIBUTING.md) to get involved. |
62 | 35 |
|
63 | | -### Grid Parameter Search Space |
64 | | -- [Mermaid source](assets/grid_param_space.mmd) |
65 | | - <img src="assets/grid_param_space.svg" width="400" height="300"/> |
| 36 | +## Manually building from source |
66 | 37 |
|
67 | | -### Hyperparameter Search |
68 | | -- [Mermaid source](assets/hyperparameter_search.mmd) |
69 | | - <img src="assets/hyperparameter_search.svg" width="400" height="300"/> |
70 | | - |
71 | | -### Imputation Pipeline |
72 | | -- [Mermaid source](assets/impute_data_for_pipe.mmd) |
73 | | - <img src="assets/impute_data_for_pipe.svg" width="400" height="300"/> |
74 | | - |
75 | | -### ML Repository Architecture |
76 | | -- [Mermaid source](assets/ml_repository_architecture.mmd) |
77 | | - <img src="assets/ml_repository_architecture.png" width="400" height="300"/> |
78 | | - |
79 | | -### Model Class Listing (Time Series) |
80 | | -- [Mermaid source](assets/model_class_list_model_class_list_ts.mmd) |
81 | | - <img src="assets/model_class_list_model_class_list_ts.svg" width="400" height="300"/> |
82 | | - |
83 | | -### Project Scoring and Model Saving |
84 | | -- [Mermaid source](assets/project_score_save.mmd) |
85 | | - <img src="assets/project_score_save.svg" width="400" height="300"/> |
86 | | - |
87 | | -### Time Series Helper Functions |
88 | | -- [Mermaid source](assets/time_series_helper.mmd) |
89 | | - <img src="assets/time_series_helper.svg" width="400" height="300"/> |
90 | | - |
91 | | -### Unit Test - Synthetic Data |
92 | | -- [Mermaid source](assets/unit_test_synthetic.mmd) |
93 | | - <img src="assets/unit_test_synthetic.svg" width="400" height="300"/> |
94 | | - |
95 | | -### Results Processing Pipeline |
96 | | -- [Mermaid source](assets/results_processing_pipeline.mmd) |
97 | | - <img src="assets/results_processing_pipeline.svg" width="600" height="450"/> |
98 | | - |
99 | | - |
100 | | -## Getting Started |
101 | | - |
102 | | -### Prerequisites |
103 | | - |
104 | | -Designed for usage with a numeric data matrix for binary classification. Single or multiple outcome variables (One v rest). An example is provided. Time series is also implemented. |
105 | | - |
106 | | -## Installation |
107 | | - |
108 | | -This project includes convenient installation scripts for Unix/Linux/macOS and Windows. These scripts will create a virtual environment, install all necessary dependencies, and register a Jupyter kernel for you. |
109 | | - |
110 | | -### Quick Install using Scripts |
111 | | - |
112 | | -1. **Clone the repository:** |
113 | | - ```shell |
114 | | - git clone https://github.com/SamoraHunter/ml_binary_classification_gridsearch_hyperOpt.git |
115 | | - cd ml_binary_classification_gridsearch_hyperOpt |
116 | | - ``` |
117 | | - |
118 | | -2. **Run the installation script:** |
119 | | - |
120 | | - * **For a standard installation:** |
121 | | - * On Unix/Linux/macOS: |
122 | | - ```bash |
123 | | - chmod +x install.sh |
124 | | - ./install.sh |
125 | | - ``` |
126 | | - * On Windows: |
127 | | - ```bat |
128 | | - install.bat |
129 | | - ``` |
130 | | - This will create a virtual environment named `ml_grid_env`. |
131 | | - |
132 | | - * **For a time-series installation (includes all standard dependencies):** |
133 | | - * On Unix/Linux/macOS: |
134 | | - ```bash |
135 | | - chmod +x install.sh |
136 | | - ./install.sh ts |
137 | | - ``` |
138 | | - * On Windows: |
139 | | - ```bat |
140 | | - install.bat ts |
141 | | - ``` |
142 | | - This will create a virtual environment named `ml_grid_ts_env`. |
143 | | - |
144 | | -## Usage |
145 | | - |
146 | | -After installation, activate the virtual environment to run your code or notebooks. |
147 | | - |
148 | | -* **To activate the standard environment:** |
149 | | - * On Unix/Linux/macOS: `source ml_grid_env/bin/activate` |
150 | | - * On Windows: `.\ml_grid_env\Scripts\activate` |
151 | | - |
152 | | -* **To activate the time-series environment:** |
153 | | - * On Unix/Linux/macOS: `source ml_grid_ts_env/bin/activate` |
154 | | - * On Windows: `.\ml_grid_ts_env\Scripts\activate` |
155 | | - |
156 | | -### Running the Notebooks |
157 | | - |
158 | | -The `notebooks/` directory contains examples for different use cases: |
159 | | - |
160 | | -- **`unit_test_synthetic.ipynb`**: The main entry point for running experiments. It demonstrates how to generate synthetic data, test the data pipeline, and run a full Hyperopt search. Start here to understand the end-to-end workflow. |
161 | | -- **`01_hyperopt_grid.ipynb`**: A focused example of running a full hyperparameter search using `Hyperopt` based on the `config_hyperopt.yml` file. |
162 | | -- **`02_single_run.ipynb`**: A script for executing a single run with a specific set of parameters defined in `config_single_run.yml`. Useful for debugging or testing one configuration. |
163 | | - |
164 | | -### Basic Example |
165 | | - |
166 | | -The main entry point for running experiments is a script or notebook that loads the configuration and iterates through the parameter space defined in `config.yml`. |
167 | | - |
168 | | -1. **Configure your experiment in `config.yml`:** |
169 | | - - Set the data path, models, and parameter space. |
170 | | - |
171 | | -2. **Run the experiment:** |
172 | | - - The following script demonstrates how to execute a full grid search based on your `config.yml`. |
173 | | - |
174 | | -```python |
175 | | -from pathlib import Path |
176 | | -from ml_grid.pipeline.data import pipe |
177 | | -from ml_grid.util.param_space import parameter_space |
178 | | -from ml_grid.util.create_experiment_directory import create_experiment_directory |
179 | | -from ml_grid.util.config_parser import load_config |
180 | | -
|
181 | | -# Load configuration from config.yml |
182 | | -config = load_config() |
183 | | -
|
184 | | -# Set project root |
185 | | -project_root = Path().resolve().parent |
186 | | -
|
187 | | -# Create a unique directory for this experiment run |
188 | | -experiments_base_dir = project_root / config['experiment']['experiments_base_dir'] |
189 | | -experiment_dir = create_experiment_directory( |
190 | | - base_dir=experiments_base_dir, |
191 | | - additional_naming=config['experiment']['additional_naming'] |
192 | | -) |
193 | | -
|
194 | | -# Generate the parameter space from the config file |
195 | | -param_space_df = parameter_space(config['param_space']).get_parameter_space() |
196 | | -
|
197 | | -# Iterate through each parameter combination and run the pipeline |
198 | | -for i, row in param_space_df.iterrows(): |
199 | | - local_param_dict = row.to_dict() |
200 | | - print(f"Running experiment {i+1}/{len(param_space_df)} with params: {local_param_dict}") |
201 | | - pipe( |
202 | | - config=config, |
203 | | - local_param_dict=local_param_dict, |
204 | | - base_project_dir=project_root, |
205 | | - experiment_dir=experiment_dir, |
206 | | - param_space_index=i |
207 | | - ) |
208 | | -``` |
209 | | -If you are using Jupyter, you can also select the kernel created during installation (e.g., `Python (ml_grid_env)`) directly from the Jupyter interface. |
210 | | -
|
211 | | -## Examples |
212 | | -
|
213 | | -See [ml_grid/tests/unit_test_synthetic.ipynb] |
214 | | -
|
215 | | -## Documentation |
216 | | -
|
217 | | -The latest documentation is hosted online and can be viewed [here](https://samorahunter.github.io/ml_binary_classification_gridsearch_hyperOpt/). |
218 | | -
|
219 | | -This project uses Sphinx for documentation. The documentation includes usage guides and an auto-generated API reference. |
220 | | -
|
221 | | -To build the documentation locally: |
222 | | -
|
223 | | -1. Install the documentation dependencies (make sure your virtual environment is activated): |
224 | | - ```bash |
225 | | - pip install -e .[docs] |
226 | | - ``` |
227 | | -
|
228 | | -2. Build the HTML documentation: |
229 | | - ```bash |
230 | | - sphinx-build -b html docs/source docs/build |
231 | | - ``` |
232 | | -
|
233 | | -3. Open `docs/build/index.html` in your web browser to view the documentation. |
234 | | -
|
235 | | -## Project Structure |
236 | | -
|
237 | | -The repository is organized to separate concerns, making it easier to navigate and extend. |
238 | | -
|
239 | | -``` |
240 | | -. |
241 | | -├── assets/ # Mermaid diagrams and other assets |
242 | | -├── docs/ # Sphinx documentation source and build files |
243 | | -├── ml_grid/ # Main source code for the library |
244 | | -│ ├── model_classes/ # Standard classifier wrappers |
245 | | -│ ├── model_classes_time_series/ # Time-series classifier wrappers |
246 | | -│ ├── pipeline/ # Core data processing and pipeline logic |
247 | | -│ ├── results_processing/ # Tools for aggregating and plotting results |
248 | | -│ └── util/ # Utility functions and global parameters |
249 | | -├── tests/ # Unit and integration tests |
250 | | -├── install.sh # Installation script for Unix/Linux |
251 | | -└── install.bat # Installation script for Windows |
252 | | -``` |
253 | | -
|
254 | | -## Contributing |
255 | | -If you would like to contribute to this project, please follow these steps: |
256 | | -
|
257 | | -Fork the repository on GitHub. |
258 | | -Create a new branch for your feature or bug fix. |
259 | | -Make your changes and commit them with descriptive commit messages. |
260 | | -Push your changes to your fork. |
261 | | -Create a pull request to the main repository's master branch. |
262 | | -
|
263 | | -## License |
264 | | -This project is licensed under the MIT License - see the LICENSE file for details. |
265 | | -
|
266 | | -
|
267 | | -## Appendix |
268 | | -
|
269 | | -
|
270 | | -## Acknowledgments |
271 | | -scikit-learn |
272 | | -hyperopt |
273 | | -H2O.ai |
| 38 | +- Install Go tools 1.20+ - (<https://golang.org/doc/install>) |
| 39 | +- Clone this repo `git clone git@github.com:nektos/act.git` |
| 40 | +- Run unit tests with `make test` |
| 41 | +- Build and install: `make install` |
0 commit comments