Watch video: https://www.youtube.com/watch?v=5nPPhZ1vUvs
ONNC (Open Neural Network Compiler) is a compilation framework designed specifically for proprietary deep learning accelerators. Its software architecture expedites porting ONNC to any DLA design that supports ONNX (Open Neural Network Exchange) operators. The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that provides a scalable, configurable, and modular design to address the computational demands of convolutional neural network inference and many proprietary SoC designs integrate NVDLA as their inference engines. Lack of extensible compiler support for NVDLA becomes the major bottleneck for supporting more AI models and optimizations. When ONNC meets NVDLA…
Summary: In this article, we describe how we leverage Intel® Math Kernel Library (Intel® MKL) and significantly improved ONNC runtime execution time.
ONNC runtime is synchronizing in C language. The advantage is that you can run on any CPU using ONNC runtime, but writing general C language on emerging hardware shows poor efficiency. ONNC Calibrator utilizes the ONNC runtime, so it runs inference slowly. It takes two hours for vgg19 models to calibrate two hundred pictures.
Intel has launched Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), which makes good use of the Intel CPU instruction set to…
[New feature] ONNC supports new operators Clip, Max, Min, ReduceMean, and PRelu.
[New feature] ONNC can compile models into C files.
[New feature] ONNC provides a library containing function implementation for 116 neural network operators defined in ONNX rel-1.3.0 specification.
[New feature] The ONNC library can call Intel MKLDNN library for accelerating the computation of convolution and Gemm (matrix multiplication) on Intel CPU.
Some hardware modules inside NVDLA change the precision of the prediction results. If a calibrator didn’t consider hardware architectural characteristics in its algorithm, then it may not preserve the precision of some AI models. For some large AI models, the lack of architectural consideration would produce unacceptable errors. To address this issue, Skymizer calibrator is architecture-aware, which models the hardware’s error model and can control the precision lost within 2%.
We run un-quantized models on ONNX runtime as the golden result and run our quantized models on FPGA as a test group. Figure 1 shows six of the seven models…
opt<T>object initialization order
NOTE: The feature described below is scheduled to be available in version 1.0.0.
ONNC serves as a bridge between AI frameworks and the underlying accelerator hardware. Like GCC in the traditional compiler area, ONNC intends to support any kind of deep learning accelerators (DLAs) with a unified interface for the compiler users. For DLA vendors to easily join the ONNC ecosystem, ONNC was designed with portability in mind. In this article, we introduce how to massage ONNC to support a DLA.
The software stack of ONNC is shown in Figure 1. The support of a DLA is programmed as a…
Memory allocation is an essential step in the traditional compiler and in the neural network (NN) compiler as well. Each variable of program (or tensor of NN model) is assigned a memory space to store its value for use by later operation. In this article, we present applying to NN models a classic allocation method based on liveness analysis, and to see if this method still performs well at the NN area. The experimental results are very encouraging. On model yolo9000, for example, the memory footprint derived is only 16% of the total tensor size of the model. This is…
The Open Neural Network Compiler (ONNC) project aims to provide a compiler to connect Open Neural Network Exchange Format (ONNX) to every Deep Learning Accelerators (DLAs). ONNX is a standard format for representing deep learning models that enables models to be correctly transferred between frameworks, like Caffe, CNTK, MXNet, PyTorch, and TensorFlow. ONNX guarantees interoperability between frameworks. ONNC pushes it further for the industry, guarantee executability between DLAs — to ensure every DLA can execute ONNX models correctly.
ONNC is a backend for DLA vendors, a kind of cross compiler that transforms ONNX models into binary machine code for DLAs…