Fireball

Features

Fireball is a Deep Neural Network (DNN) library for creating, training, evaluating, quantizing, and compressing DNN based models across a range of applications. Here is a summary of main features:

Easily create any network structure using a limited set of fundamental building blocks chained together in a text string.
Create models for classification, regression, object detection, and NLP applications.
Add functionality by creating your own "Blocks" and reuse them in your network structure.
Define your own layer types or loss functions and use them in the network structure.
Apply Low-Rank decomposition on layers of your model to reduce the number of network parameters.
Apply Pruning to the network parameters.
Apply K-Means quantization on network parameters to further reduce the size of model.
Retrain your model after applying low-rank decomposition, pruning, and/or quantization.
Compress models using arithmetic entropy coding.
Export the models to ONNX, TensorFlow, or CoreML even after applying low-rank decomposition, pruning, and/or quantization.

Compressing Neural Networks

API for compression image of computer code

Fireball provides the users with a set of easy-to-use APIs for different types of model compression. This includes Low-Rank decomposition, Pruning, Codebook-based quantization, and Lossless Entropy Coding. A variety of examples for different use cases are available in "Playgrounds" (python notebook files).

The methods used for Low-Rank Decomposition compression, Codebook Quantization, and lossless Arithmetic Coding are explained in this paper and this presentation as submitted to 2021 Data Compression Conference (DCC). Some of these methods were included in the MPEG NNR standard as explained in this paper.

iOS deployment

Fireball provides APIs that can be used to export a model to CoreML for iOS deployment. This includes the models that have been compressed and/or quantized.

Demo Video

The following video shows how Fireball can be used to compress an object-detection deep neural network model to one tenth of the original size while running about 46% faster with insignificant effect on accuracy. I tried to keep the video short (Just under 5 minutes) and therefore some important information may pass quickly. Feel free to pause/rewind the video to catch the details.