Skip to content

Bin-Cao/TCGPR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

124 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

For any inquiries or assistance, feel free to contact Mr. CAO Bin at:
📧 Email: bcao686@connect.hkust-gz.edu.cn

Cao Bin is a PhD candidate at the Hong Kong University of Science and Technology (Guangzhou), under the supervision of Professor Zhang Tong-Yi. His research focuses on AI for science, especially intelligent crystal-structure analysis and discovery. Learn more about his work on his homepage.

TCGPR

A Python library for divide-and-conquer (TCGPR) - an efficient strategy tailored for small datasets in materials science and beyond.


Project History

  • 2022: TCGPR was first proposed and implemented. In collaboration with Mr. Hao Yuan (experiments cooperator) and Mr. Qinghua Wei (experiments cooperator) at 2023, it was successfully applied to the optimization of lead-free solder alloys. → Published in npj Computational Materials News link

  • 2024: After two years of development, TCGPR was enhanced with sequential feature selection and outlier detection. In collaboration with Mr. Tianliang Li (experiments cooperator) and Mr. Tianhao Su (computations cooperator), it was applied to anti-tumor ferroptosis studies. → Published in SMALL News link


Algorithm Overview

For an in-depth explanation of the algorithm, see the TCGPR Introduction PDF.


Installation

Install TCGPR via PyPI:

pip install PyTcgpr

To verify the installation:

pip show PyTcgpr

To upgrade to the latest version:

pip install --upgrade PyTcgpr

Getting Started

1. Data Screening | Partition Mode

from PyTcgpr import TCGPR

TCGPR.fit(
    filePath = "data.csv",
    initial_set_cap = 3,
    sampling_cap = 2,
    up_search = 500,
    CV = 'LOOCV',
    Task = 'Partition'
)

2. Data Screening | Identification Mode

from PyTcgpr import TCGPR

TCGPR.fit(
    filePath = "data.csv",
    sampling_cap = 2,
    up_search = 500,
    CV = 'LOOCV',
    Task = 'Identification'
)

3. Feature Selection Mode

from PyTcgpr import TCGPR

TCGPR.fit(
    filePath = "data.csv",
    Mission = 'FEATURE',
    sampling_cap = 2,
    up_search = 500,
    CV = 'LOOCV'
)

Main Parameters

:param Mission: str, default='DATA'
    - 'DATA': Perform data screening
    - 'FEATURE': Perform feature selection

:param filePath: str
    Path to input dataset in CSV format

:param initial_set_cap: int or list
    Initial subset size or index list for Partition mode

:param sampling_cap: int, default=1
    Number of items selected per iteration

:param measure: str, default='Pearson'
    Correlation type: 'Pearson' or 'Determination'

:param ratio: float
    Tolerance threshold for correlation-based filtering

:param target: int, default=1
    Number of targets in regression (for feature selection)

:param weight: float, default=0.2
    Weight coefficient in GGMF score calculation

:param up_search: int, default=500
    Upper limit for search iterations

:param exploit_coef: float, default=2
    Variance constraint for EI acquisition function

:param exploit_model: bool, default=False
    If True, disables GGMF and uses only R values

:param CV: int or str, default=10
    Cross-validation: integer (e.g., 5, 10) or 'LOOCV'

Output

After running, TCGPR outputs a CSV file with the remaining samples:

Dataset_remained_by_TCGPR.csv

Source Code

PyPI - TCGPR

Compatible with Windows, Linux, and macOS.


Patent

Patent Image

Developer

Developed by Bin Cao Email: bcao686@connect.hkust-gz.edu.cn Feel free to open an issue or contact me for any questions, bugs, or collaboration opportunities.


Contributing

Contributions and suggestions are welcome!

  • Report bugs or request features via GitHub Issues
  • Submit a pull request with improvements or fixes
  • Interested in research collaboration? Please get in touch!

Citation

If you use this code in your research, please cite the following papers:

  • Li T., Cao B., Su T., ... Feng L., Zhang T. Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy, SMALL Link to paper

  • Wei Q., Cao B., Yuan H., ... Dong Z., Zhang T. Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility, npj Computational Materials Link to paper

About

[NPJ Com Mat 2023 | Small 2024] Machine Learning Algorithm : outlier identifying, feature selection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors