ONNC Quantization to INT8 Experiment

Some hardware modules inside NVDLA change the precision of the prediction results. If a calibrator didn’t consider hardware architectural characteristics in its algorithm, then it may not preserve the precision of some AI models. For some large AI models, the lack of architectural consideration would produce unacceptable errors. To address this issue, Skymizer calibrator is architecture-aware, which models the hardware’s error model and can control the precision lost within 2%.

onnx runtime (FP32) and quantization experiment on FPGA (INT8) TOP1/TOP5 DIFF

We run un-quantized models on ONNX runtime as the golden result and run our quantized models on FPGA as a test group. Figure 1 shows six of the seven models get precision lost less than 2%. Only VGG19 has higher precision lost and the precision diff is preserved in 3%.

Result

ONNX Experiment Environment

Dataset: All experiments are run with 12500 images using ILSVRC2012 dataset.

Hardware Configuration: CPU Intel Core i7–4790 @3.6GHz

FPGA Experiment Environment

Dataset: All experiments are run with 12500 images using ILSVRC2012 dataset.

Hardware Configuration: simulating NVDLA nv_small using ZCU102

Experiment Method

TOP = predicted success times / total number of images

TOP5: Pick out top five prediction value of the image from the dataset (12500 images), if one of the images is the real label of this image, then this image is predicted to be successful.

TOP1: Pick out the highest predictive value of the dataset (12500 images), if the image is the real label of the image, then this image is predicted to be successful.

The Open Neural Network Compiler (ONNC), a compiler that connects Open Neural Network Exchange Format (ONNX) to every deep learning accelerator (DLA).

The Open Neural Network Compiler (ONNC), a compiler that connects Open Neural Network Exchange Format (ONNX) to every deep learning accelerator (DLA).