|
For any inquiries or assistance, feel free to contact Mr. CAO Bin at: 📧 Email: bcao686@connect.hkust-gz.edu.cn Cao Bin is a PhD candidate at the Hong Kong University of Science and Technology (Guangzhou), under the supervision of Professor Zhang Tong-Yi. His research focuses on AI for science, especially intelligent crystal-structure analysis and discovery. Learn more about his work on his homepage. |
A Python library for divide-and-conquer (TCGPR) - an efficient strategy tailored for small datasets in materials science and beyond.
-
2022: TCGPR was first proposed and implemented. In collaboration with Mr. Hao Yuan (experiments cooperator) and Mr. Qinghua Wei (experiments cooperator) at 2023, it was successfully applied to the optimization of lead-free solder alloys. → Published in npj Computational Materials News link
-
2024: After two years of development, TCGPR was enhanced with sequential feature selection and outlier detection. In collaboration with Mr. Tianliang Li (experiments cooperator) and Mr. Tianhao Su (computations cooperator), it was applied to anti-tumor ferroptosis studies. → Published in SMALL News link
For an in-depth explanation of the algorithm, see the TCGPR Introduction PDF.
Install TCGPR via PyPI:
pip install PyTcgprTo verify the installation:
pip show PyTcgprTo upgrade to the latest version:
pip install --upgrade PyTcgprfrom PyTcgpr import TCGPR
TCGPR.fit(
filePath = "data.csv",
initial_set_cap = 3,
sampling_cap = 2,
up_search = 500,
CV = 'LOOCV',
Task = 'Partition'
)from PyTcgpr import TCGPR
TCGPR.fit(
filePath = "data.csv",
sampling_cap = 2,
up_search = 500,
CV = 'LOOCV',
Task = 'Identification'
)from PyTcgpr import TCGPR
TCGPR.fit(
filePath = "data.csv",
Mission = 'FEATURE',
sampling_cap = 2,
up_search = 500,
CV = 'LOOCV'
):param Mission: str, default='DATA'
- 'DATA': Perform data screening
- 'FEATURE': Perform feature selection
:param filePath: str
Path to input dataset in CSV format
:param initial_set_cap: int or list
Initial subset size or index list for Partition mode
:param sampling_cap: int, default=1
Number of items selected per iteration
:param measure: str, default='Pearson'
Correlation type: 'Pearson' or 'Determination'
:param ratio: float
Tolerance threshold for correlation-based filtering
:param target: int, default=1
Number of targets in regression (for feature selection)
:param weight: float, default=0.2
Weight coefficient in GGMF score calculation
:param up_search: int, default=500
Upper limit for search iterations
:param exploit_coef: float, default=2
Variance constraint for EI acquisition function
:param exploit_model: bool, default=False
If True, disables GGMF and uses only R values
:param CV: int or str, default=10
Cross-validation: integer (e.g., 5, 10) or 'LOOCV'After running, TCGPR outputs a CSV file with the remaining samples:
Dataset_remained_by_TCGPR.csvCompatible with Windows, Linux, and macOS.
Developed by Bin Cao Email: bcao686@connect.hkust-gz.edu.cn Feel free to open an issue or contact me for any questions, bugs, or collaboration opportunities.
Contributions and suggestions are welcome!
- Report bugs or request features via GitHub Issues
- Submit a pull request with improvements or fixes
- Interested in research collaboration? Please get in touch!
If you use this code in your research, please cite the following papers:
-
Li T., Cao B., Su T., ... Feng L., Zhang T. Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy, SMALL Link to paper
-
Wei Q., Cao B., Yuan H., ... Dong Z., Zhang T. Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility, npj Computational Materials Link to paper
