BMNNSDK Architecture

BMNNSDK(BitMain Neural Network SDK)is the BitMain’s proprietary deep learning SDK based on BM AI chip, with its powerful tools, you can deploy the deep learning application in the runtime environment, and deliver the maximum inference throughput and efficiency. BMNNSDK include BMNet and BMRuntime, BMNet is the deep neural network compiler for TPU processor, it can effectively map the neural network algorithm to the TPU instruction. BMRuntime provides a set of programming interfaces through library, which abstract away the details of underlying hardware implementations, it makes the power of TPU become easily accessible to application via a simple set of APIs.
This picture shows the whole solution of deep neural network on BITMAIN TPU processors.
1) Drivers: Drive TPU processor and provide APIs to access TPU processor.
2) BMRuntime: Is the TPU runtime system for neural network computing.
3) BMKernel: provide APIs to translate general neural network instructions to BitMain hardware instructions.
4) BMNet: Provides tools to quantize and compile the trained model.

BMRuntime Architecture


BMRuntime Architecture
BMRuntime is a library specifically designed for building high-performance neural networks applications, and these applications run on BITMAIN TPU processors (such as BM1682/BM1880). BMRuntime provides a serial of simple and flexible interfaces, which make the deployment of neural networks’ applications easy and efficient.
As an overview, the library provides many functions which can be classified as following:
1) BM device operations
2) BM context operations
3) BM kernel operations
4) BM memory operations
5) BM networks operations
BM device operations include enumerating device, opening device, querying it, configuring it and closing it.
BM context operations include context creation, context binding and context destruction.
BM kernel operations include kernel creating and kernel destruction.
BM memory operations include device memory allocation, host memory allocation, device memory freeing, host memory freeing and memory copying between host memory and device memory.
BM networks operations include registering neural networks, inferencing networks as input and cleaning up networks.


The main, if not only, labor BMKernel engages in is formatting users’ computation requests into hardware instructions, thus freeing them from bothering about the exact byte or bit loca- tions each computation parameter values should be fitted into. A typical workflow begins by users configuring a BMKernel context. And then they specify their computation requests by calling computation APIs. Finally they acquire the generated hardware instructions. The in- structions may be sent to be executed on real hardware, or be processed for further analysis and transformations, all on users’ tastes.


BMContext describes BM resource context, generated by bm_init(), and released by bm_exit(). Applications should call bm_init() first, and call bm_exit() before exit. Example as follow:
int main(int argc, char *argv[]) {
bmctx_t ctx; // BM context handle
bmerr_t ret;
ret = bm_init (0, &ctx); // init resource, and get context
bm_exit (ctx); // release resources
return 0;


There are three types of memory: host memory/device memory/local memory.
1) host memory : also called system memory, this type memory used in host, and can use malloc()/free() or new/delete to alloc and free memory. This memory used by applications.
2) device memory : also called global memoy, this type memory used in device such as BM1682/BM1880 device. BMMemory manage this type memory to store neural networks data.
3) local memory : also called chip memory, only accessed by chip.
Normally, original data stored in host memory, need to be transferred to device memory by bm_memcpy_s2d(). TPU chip can only access local memory, so there are some kernel constructions such as tl_load()/tl_store() to transfer data between device memory and local memory.
Data transfer as follows:
BMMemory provides APIs to manage Device Memory. The main APIs as follows:
Alloc the device memory by specified size, and return the device memory handle.
Free device memory
Get the memory size pointed to by the device memory handle
Get the address of the memory through the device memory handle; the address can be used in BMKernel APIs.
Copy data from system memory to device memory
Copy data from device memory to system memory

BMNet inference engine

BMNet inference engine is neural network inference engine, uses bmodel to build environment, and infer input data to get result data (output data).
bmodel is a deep neural network model format for BITMAIN TPU processors, which contains the weight of a specific neural network, instruction streams of different shapes, and so on. It is generated by BMNet using caffemodel. For example, using resnet caffemodel to generate resnet.bmodel. One bmodel file can support multiple shapes (N/C/H/W). So before do inference, bmnet_set_input_shape() should be called to tell which shape to do inference.
The main APIs as follows:
use bmodel to register BMNet environment,and generate BMNet context handle.
set input shape(NCHW), this shape must support by registered bmodel.
get output info,such as output data size and output shape, and so on.
infer input data, and get output data.
exit BMNet environment, and release resources.

BMNet Architecture

General Description

The BMNet contains two types of tools, calibration_caffe and bm_builder.bin for caffe model, and calibration_onnx, bm_builder_onnx.bin for onnx model. The tool “calibration_caffe” takes a caffemodel file and training data to convert a new INT8 caffemodel file and an calibration table file.These two files are the input of BMNET,and make INT8 computation on BM1880 is possible. The bm_builder.bin combines frontend, optimizer and backend modules into one executable binary, and links to It takes network’s caffemodel and deploy.prototxt as inputs, and finally generates bmodel after compiled.
More detailed information about bmnet toolkit, please visit the BMNet Compiler page
Last modified 2yr ago