Fireball
A Deep Neural Network Library





Features


Fireball is a Deep Neural Network (DNN) library for creating, training, evaluating, quantizing, and compressing DNN based models across a range of applications. Here is a summary of main features:

Compressing Neural Networks


API for compression image of computer code

Fireball provides the users with a set of easy-to-use APIs for different types of model compression. This includes Low-Rank decomposition, Pruning, Codebook-based quantization, and Lossless Entropy Coding. A variety of examples for different use cases are available in "Playgrounds" (python notebook files).

The methods used for Low-Rank Decomposition compression, Codebook Quantization, and lossless Arithmetic Coding are explained in this paper and this presentation as submitted to 2021 Data Compression Conference (DCC). Some of these methods were included in the MPEG NNR standard as explained in this paper.

API for compression image of computer code

iOS deployment


Fireball provides APIs that can be used to export a model to CoreML for iOS deployment. This includes the models that have been compressed and/or quantized.

iOS Deployment

Demo Video


The following video shows how Fireball can be used to compress an object-detection deep neural network model to one tenth of the original size while running about 46% faster with insignificant effect on accuracy. I tried to keep the video short (Just under 5 minutes) and therefore some important information may pass quickly. Feel free to pause/rewind the video to catch the details.