Watch video:

ONNC (Open Neural Network Compiler) is a compilation framework designed specifically for proprietary deep learning accelerators. Its software architecture expedites porting ONNC to any DLA design that supports ONNX (Open Neural Network Exchange) operators. The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that provides a scalable, configurable, and modular design to address the computational demands of convolutional neural network inference and many proprietary SoC designs integrate NVDLA as their inference engines. Lack of extensible compiler support for NVDLA becomes the major bottleneck for supporting more AI models and optimizations. When ONNC meets NVDLA…

Summary: In this article, we describe how we leverage Intel® Math Kernel Library (Intel® MKL) and significantly improved ONNC runtime execution time.


ONNC runtime is synchronizing in C language. The advantage is that you can run on any CPU using ONNC runtime, but writing general C language on emerging hardware shows poor efficiency. ONNC Calibrator utilizes the ONNC runtime, so it runs inference slowly. It takes two hours for vgg19 models to calibrate two hundred pictures.

Intel has launched Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN), which makes good use of the Intel CPU instruction set to…

Release Note

ONNC framework

[New feature] ONNC supports new operators Clip, Max, Min, ReduceMean, and PRelu.

C Backend

[New feature] ONNC can compile models into C files.
[New feature] ONNC provides a library containing function implementation for 116 neural network operators defined in ONNX rel-1.3.0 specification.
[New feature] The ONNC library can call Intel MKLDNN library for accelerating the computation of convolution and Gemm (matrix multiplication) on Intel CPU.

Supported ONNX Operators

  • Support
  • Add
  • AveragePool
  • BatchNormalization
  • Concat
  • Conv
  • Clip new
  • Gemm
  • GlobalAveragePool
  • Identity
  • LRN
  • Max new
  • MaxPool
  • Min new
  • Mul
  • PRelu new
  • Relu
  • ReduceMean new
  • Reshape
  • Softmax
  • Sum
  • Transpose (use in ShuffleNet)
  • Unsqueeze

Supported ONNX Models

Some hardware modules inside NVDLA change the precision of the prediction results. If a calibrator didn’t consider hardware architectural characteristics in its algorithm, then it may not preserve the precision of some AI models. For some large AI models, the lack of architectural consideration would produce unacceptable errors. To address this issue, Skymizer calibrator is architecture-aware, which models the hardware’s error model and can control the precision lost within 2%.

onnx runtime (FP32) and quantization experiment on FPGA (INT8) TOP1/TOP5 DIFF

We run un-quantized models on ONNX runtime as the golden result and run our quantized models on FPGA as a test group. Figure 1 shows six of the seven models…

Release Note

ONNC framework

  • [New Feature] add methods for manipulating ComputeOperator input/output links
  • [New Feature] add methods for erasing Value in Module
  • [New Feature] add new method addOnncIrOptimization() for class TargetBackend
  • [New Feature] add new method runOnComputeGraph() for class CustomPass<T>
  • [New Feature] add several utility libraries
  • [New Feature] add 5 ONNC IR optimization passes
  • [Bug fix] fix segmentation fault due to unexpected global opt<T> object initialization order
  • [Bug fix] fix name collision when using type LiveInterval
  • [Bug fix] remove C++11 incompatible codes
  • [Bug fix] fix bugs in default ComputeVisitor::visit() implementation
  • [Bug fix] fix ONNC runtime bugs for 12 ONNX model zoo models
  • [Bug fix]…

Release Note

New Features

NVDLA Backend

  • The first open-source compiler backend that supports NVIDIA Deep Learning Accelerator (NVDLA)
  • Initial release of nv_full hardwre configuration support
  • Support status for the models in the ONNX model zoo — ONNC can compile 6 models and run on NVDLA virtual platform successfully. 2 models are not supported by nv_full configuration. The other 4 models need support for more operators.

Framework Support

  • Interpreter Interface — Target backend now can write a customized interpreter.
  • Vanilla Backend — A template for porting a new backend
  • Statistic API



  • Add more verbose level for debugging or benchmarking (level 1 to 4)

NOTE: The feature described below is scheduled to be available in version 1.0.0.

ONNC serves as a bridge between AI frameworks and the underlying accelerator hardware. Like GCC in the traditional compiler area, ONNC intends to support any kind of deep learning accelerators (DLAs) with a unified interface for the compiler users. For DLA vendors to easily join the ONNC ecosystem, ONNC was designed with portability in mind. In this article, we introduce how to massage ONNC to support a DLA.

The software stack of ONNC is shown in Figure 1. The support of a DLA is programmed as a…

Memory allocation is an essential step in the traditional compiler and in the neural network (NN) compiler as well. Each variable of program (or tensor of NN model) is assigned a memory space to store its value for use by later operation. In this article, we present applying to NN models a classic allocation method based on liveness analysis, and to see if this method still performs well at the NN area. The experimental results are very encouraging. On model yolo9000, for example, the memory footprint derived is only 16% of the total tensor size of the model. This is…

ONNC, Open Neural Network Compiler

The Open Neural Network Compiler (ONNC) project aims to provide a compiler to connect Open Neural Network Exchange Format (ONNX) to every Deep Learning Accelerators (DLAs). ONNX is a standard format for representing deep learning models that enables models to be correctly transferred between frameworks, like Caffe, CNTK, MXNet, PyTorch, and TensorFlow. ONNX guarantees interoperability between frameworks. ONNC pushes it further for the industry, guarantee executability between DLAs — to ensure every DLA can execute ONNX models correctly.

ONNC is a backend for DLA vendors, a kind of cross compiler that transforms ONNX models into binary machine code for DLAs…


The Open Neural Network Compiler (ONNC), a compiler that connects Open Neural Network Exchange Format (ONNX) to every deep learning accelerator (DLA).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store