GitHub - excepto64/privacy_explainability: Experiment evaluating the accuracy, privacy and explainability of a ML trained on datasets with different numbers of features.

Privacy-Explainability

Measure the predictive performance, privacy and explainability of your ML model.

Description

Measure the predictive performance, privacy and explainability of an ML model using MLflow.

Currently we measure a model that predicts survival after being diagnosed with prostate cancer from the PLCO dataset. We then measure the

predictive performance of the model (measured using scaled MCC, MCC (Matthews Correlation Coefficient), and accuracy),
privacy (measured using PBI)
explainability (measured using the monotonicity and non-sensitivity of the explanation and the fraction of features with real world meaning)

Usage

The run.py file contains all the functions to run the training and evaluation of the model.

To perform an evaluation choose the model (a scikit-learn estimator) and parameters, together with the data file and the column description file.

The option selection is in the test_run.ipynb file

The data file is the PLCO first cancer dataset. Unfortunately due to sharing restrictions it can't be provided in the repository, but access can be requested here, with request usually being processed within 2 weeks. The data was slightly modified to standardise non-response answers to NaN. In our experiment, we filter only to include patients diagnosed with prostate cancer: fstcan_cancersite = 1.

Moreover the column information file columns_prostate.csv includes relevant information about features selected for the experiment.

Keep - A subset of all the features that could reasonably included. Does not include features obviously revealing the prediction target, and duplicates in rarely used encodings, as well as trial information. Is used as a baseline to calculate PBI.
Keep Narrow - A subset of Keep, which is the current set of features being evaluated.
Categorical - An indication whether a feature is categorical, as these are often represented with integers in the dataset. Marking them as categorical stops the features being incorrectly being interpreted as ordinal.
Clinical Meaning - indicates whether a feature has clinical meaning or not. Used to derive the fraction of features with clinical meaning.
Demographic - used to indicate if the given column contains demographic information.
Patient history - used to indicate if the given column contains patient history (incomplete).

The output is logged using mlflow, so to activate the mlflow application you need to run mlflow server, specifiyng the port if necessary with mlflow server --port <port_number>.

Authors and acknowledgment

Adam Harrison - the code. Work performed while working for IT Innovation, University of Southampton.

Chris Duckworth - supervision

License

For open source projects, say how it is licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
.gitignore		.gitignore
LICENSE		LICENSE
Metrics_description.pdf		Metrics_description.pdf
README.md		README.md
columns_prostate_narrow.csv		columns_prostate_narrow.csv
columns_prostate_sub_1.csv		columns_prostate_sub_1.csv
columns_prostate_sub_2.csv		columns_prostate_sub_2.csv
columns_prostate_sub_3.csv		columns_prostate_sub_3.csv
columns_prostate_sub_4.csv		columns_prostate_sub_4.csv
columns_prostate_sub_5.csv		columns_prostate_sub_5.csv
columns_prostate_sub_6.csv		columns_prostate_sub_6.csv
columns_prostate_wide.csv		columns_prostate_wide.csv
run.py		run.py
test_run.ipynb		test_run.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Privacy-Explainability

Description

Usage

Authors and acknowledgment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Privacy-Explainability

Description

Usage

Authors and acknowledgment

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages