NOTE: The feature described below is scheduled to be available in version 1.0.0.
ONNC serves as a bridge between AI frameworks and the underlying accelerator hardware. Like GCC in the traditional compiler area, ONNC intends to support any kind of deep learning accelerators (DLAs) with a unified interface for the compiler users. For DLA vendors to easily join the ONNC ecosystem, ONNC was designed with portability in mind. In this article, we introduce how to massage ONNC to support a DLA.
The software stack of ONNC is shown in Figure 1. The support of a DLA is programmed as a backend in the stack. The underlying hardware could be an ASIC specific to the AI inference task, or a general purpose CPU, or something in between, such as GPU and DSP. To support such a wide range of computing hardwares, ONNC provides two kinds of backend interfaces for porting, the LLVM IR and the ONNX IR. If a DLA already supports the LLVM compiler, it can be connected to ONNC seamlessly via the LLVM IR. Usually this feature is supported by CPU, GPU, and even DSP. On the contrary, if a DLA has unique computation features and is not compatible to LLVM, ONNC also provides a “vanilla” backend to jump-start the porting work.
Porting to DLA with LLVM backend
In Figure 1 there is an LLVM frontend module in the stack, which is to convert the ONNX IR into the LLVM IR. In this module we integrated the data structure as well as the manipulating functions of the LLVM IR from the LLVM project. Therefore, if you are familiar with the LLVM project, you could port your LLVM DLA backend to ONNC effortlessly.
Besides compiling a model itself, ONNC also provides a runtime library to assist the AI inference task. The assistance includes, for example, allocating memory for input/weights/output, reading/writing files, transforming image formats, and so on. The library contains a set of utility functions in the LLVM format, which can be linked with the model together. With the library, AI app developers can save the effort of implementing common reusable facilities.
Porting to DLA by Vanilla Backend
ONNC provides a vanilla backend as a starting point to handcraft a custom backend. Its code basically includes three steps:
- Inherit the class of DLATargetBackend.
- Implement the class member functions addTensorSel, addTensorSched, addMemAlloc, and addCodeEmit.
- Modify the make system to include your custom backend into ONNC.
The most critical part is to implement the four member functions which represent the four major phases in the NN compilation process.
- addTensorSel: Select corresponding instructions for target devices.
- addTensorSched: Schedule instructions.
- addMemAlloc: Turn symbolic operands into memory addresses.
- addCodeEmit: Emit binary codes for target devices.
Fortunately, ONNC has provided a template implementation for those functions. The template is based on a set of instructions that we think should cover most of the use cases in the NN inference. The instruction set is in fact, the same as that defined by ONNX. If your DLA does not support that many instructions, you can shrink the instruction set by wiping the registry of those unused instructions off the code. On the other hand, you can also expand the instruction set by inheriting appropriate classes and then adding the registry for the new instructions.
The template is also involved with some optimization algorithms, such as scheduling and memory allocation. Each algorithm is implemented in ONNC as a “pass” (the same concept as LLVM pass.) In fact, each member function (addTensorSel, etc.) is composed of related optimization or analysis passes. ONNC already provides some useful passes for use freely. For example, at the TensorSched phase, we introduce a pass about scheduling for optimal memory access, trying to solve the bottleneck of the NN inference problem. Hence, with the default passes, you have a well-optimized backend quickly available. However, if you have another idea about the optimization algorithms, just inherit the pass class, register the new pass in the code to replace the default one, and then you got a customized backend.
As a hardware vendor, you can fully focus on the developing of the best AI hardwares. Then paying just a little more effort to port your backend to ONNC by either the LLVM IR or vanilla way, your hardware can quickly connect to every state-of-the-art AI framework such as Caffe and TensorFlow, and then be available to the AI developers in the whole world.