Fireball is a Deep Neural Network (DNN) library for creating, training, evaluating, quantizing, and compressing DNN based models across a range of applications. Here is a summary of main features:
Fireball provides the users with a set of easy-to-use APIs for different types of model compression. This includes Low-Rank decomposition, Pruning, Codebook-based quantization, and Lossless Entropy Coding. A variety of examples for different use cases are available in "Playgrounds" (python notebook files).
The methods used for Low-Rank Decomposition compression, Codebook Quantization, and lossless Arithmetic Coding are explained in this paper and this presentation as submitted to 2021 Data Compression Conference (DCC). Some of these methods were included in the MPEG NNR standard as explained in this paper.
Fireball provides APIs that can be used to export a model to CoreML for iOS deployment. This includes the models that have been compressed and/or quantized.

The following video shows how Fireball can be used to compress an object-detection deep neural network model to one tenth of the original size while running about 46% faster with insignificant effect on accuracy. I tried to keep the video short (Just under 5 minutes) and therefore some important information may pass quickly. Feel free to pause/rewind the video to catch the details.