ONNC Runtime Incorporated with Intel MKL Library

Summary: In this article, we describe how we leverage Intel® Math Kernel Library (Intel® MKL) and significantly improved ONNC runtime execution time.

Motivation

ONNC runtime is synchronizing in C language. The advantage is that you can run on any CPU using ONNC runtime, but writing general C language on emerging hardware shows poor efficiency. ONNC Calibrator utilizes the ONNC runtime, so it runs inference slowly. It takes two hours for vgg19 models to calibrate two hundred pictures.

Intel has launched Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), which makes good use of the Intel CPU instruction set to optimize Deep Neural Network operators. In this article, incorporating the core math functions from the MKL-DNN library into the ONNC runtime to speed up the calibrator operation.

Experiment

Take the following steps to Incorporated Intel MKL Library to ONNC Runtime

  1. Incorporated the MKL-DNN library in ONNC Runtime
  2. Call MKL-DNN in ONNC Runtime
  3. ONNC Runtime link to MKL-DNN library

Results

The performance of the Conversion Operator(Conv) significantly improved.

Figure 1 shows the ONNC Runtime with MKL-DNN compares with Native ONNC Runtime. The following figure shows the execution time comparison of ONNC Runtime with and without the MKL-DNN library on a machine with a 3.4 GHz Intel i7 Core and 64G DDR3 memory for the ONNX 12 models. Each experiment uses the ONNC runtime library to calibrate the scaling factors of INT8 inference with 100 images. The result shows on vgg19, after using MKL-DNN, the inference time is shortened by 20 times compared to the Original ONNC Runtime. In other words, calibration can be shortened from two hours to six minutes.

Figure 1 ONNC runtime vs. ONNC runtime with MKL-DNN x-axis: show execution time (seconds) y-axis: shows model

The following table shows the execution time comparison of ONNC Runtime with and without the MKL-DNN library on a machine with a 3.4 GHz Intel i7 Core and 64G DDR3 memory for the ONNX 12 models. Each experiment uses the ONNC runtime library to calibrate the scaling factors of INT8 inference with 100 images.

Table 1 ONNC runtime vs. ONNC runtime with MKL-DNN (m: minutes/ s: seconds)

Conclusion

ONNC (Open Neural Network Compiler) is a retargetable compilation framework designed specifically for proprietary deep learning accelerators. Its software architecture expedites porting ONNC to any Deep Learning Accelerator (DLA) design that supports ONNX (Open Neural Network Exchange) operators. ONNC Runtime shows significant improvement in execution time by incorporating intel MKL-DNN library.

The Open Neural Network Compiler (ONNC), a compiler that connects Open Neural Network Exchange Format (ONNX) to every deep learning accelerator (DLA).