Skip to content

Latest commit

 

History

History
8 lines (7 loc) · 530 Bytes

File metadata and controls

8 lines (7 loc) · 530 Bytes

Data-Modeling

Scripts for data modeling

Grasshopper Algorithm for automated feature selection in ML pipelines:

  • Partly experimental, introduces fairness and balance metrics not in the original authors' work (Saremi et al 2017)
  • Essentially brute forces finding an optimal parameter set for feeding into another ML algorithm
  • Uses KNN, so it will get bogged down by massive datasets and is not parellelizable in Spark
  • KNN was chosen by the original authors because it is particularly sensitive to poor parameter selections