Liveness Analysis Helps Save Gigabytes of Memory Usage for AI Inference

Memory allocation is an essential step in the traditional compiler and in the neural network (NN) compiler as well. Each variable of program (or tensor of NN model) is assigned a memory space to store its value for use by later operation. In this article, we present applying to NN models a classic allocation method based on liveness analysis, and to see if this method still performs well at the NN area. The experimental results are very encouraging. On model yolo9000, for example, the memory footprint derived is only 16% of the total tensor size of the model. This is huge! Consider that the total tensor size of yolo9000 is 1.4GB. With the help of liveness analysis, only 255MB of memory is needed to store all those tensors. We save up to 1.1GB of memory!

Figure 1: Memory footprint normalized to total tensor size. The smaller, the better.
Figure 2: IR of LeNet, transformed from ONNX by ONNC.
Figure 3: Memory allocation result for LeNet.

The Open Neural Network Compiler (ONNC), a compiler that connects Open Neural Network Exchange Format (ONNX) to every deep learning accelerator (DLA).