Model Compression: Pruning

Overview

Tensorflow based implementation of Learning both Weights and Connections for Efficient Neural Networks by Han S., Pool J., et al.

Pruning is a Model Compression Technique which allows the user to compress the model to a smaller size while maintaining marginal loss in accuracy. Pruning also allows the model to be optimized for real time inference for resource-constrained devices.

For more information on Model Compression and Pruning, please read Model Compression via Pruning.

Concepts Utilised

Magnitude Based Pruning.

Explanation

This implementation utilizes a dataset which is not available for public usage. But this implementation can be utilized on other datasets.

Code has two different implementations:

Retrain Attempt: Inducing sparsity every iteration while retraining.
Baseline Attempt: Inducing sparsity by making the weight values beyond a certain threshold equal to 0.0 without retraining.

Copyright

Author @Parth Malpathak

All the codes and implementatations are a part of 10605 (Machine Learning for Large Datasets) course requirements. Please go through the academic integrity policy of Carnegie Mellon University before cloning this repository and duplicating the codes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model Compression: Pruning

Overview

Concepts Utilised

Explanation

Copyright

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Model Compression: Pruning

Overview

Concepts Utilised

Explanation

Copyright