BMNNSDK API v1
BMNNSDK provides a lightweight set of c/c++APIs for deep learning application developer, it consists of TPU BMRuntime Library, BMKernel Library and BMNet Library. Which will be described in detail in this section.

BM Runtime Library

bm_init

1
bmerr_t bm_init(
2
int index,
3
bmctx_t *ctx)
Copied!
bm_init() initializes BM device and creates a handle to BM context.
Parameter
Type
Description
index
Input
Not used now. Set 0 as default value.
ctx
Output
The pointer of BM context handle.

bm_exit

1
void bm_exit(
2
bmctx_t ctx)
Copied!
bm_exit() must be called before application exits. It will release all internal resources.
Parameter
Type
Description
ctx
Input
The BM context handle which is created by bm_init().

bm_enum_devices

1
void bm_enum_devices(
2
int *count,
3
bm_devinfo_t devinfo[])
Copied!
bm_enum_devices() enumerates all BM devices in the system.
Parameter
Type
Description
count
Output
The count of BM device.
devinfo
Output
The array of device info.

bm_device_open

1
bmerr_t bm_device_open(
2
int index,
3
bmdev_t *dev)
Copied!
bm_device_open() opens a BM device.
Parameter
Type
Description
index
Input
The index of BM device.
dev
Output
The pointer of BM device handle.

bm_device_close

1
void bm_device_close(
2
bmdev_t dev)
Copied!
bm_device_close() closes an opened BM device.
Parameter
Type
Description
dev
Input
The BM device handle.

bm_device_query

1
bmerr_t bm_device_query(
2
bmdev_t dev,
3
int id,
4
void *buf)
Copied!
bm_device_query() always returns BM_ERR_NOT_SUPPORTED now.

bm_device_config

1
bmerr_t bm_device_config(
2
bmdev_t dev,
3
int id,
4
void *buf)
Copied!
bm_device_config() always returns BM_ERR_NOT_SUPPORTED now.

bm_device_get_info

1
bm_devinfo_t bm_device_get_info(
2
bmdev_t dev)
Copied!
bm_device_get_info() return a BM device information.
Parameter
Type
Description
dev
Input
The BM device handle.

bm_context_create

1
bmerr_t bm_context_create(
2
bmctx_t *ctx)
Copied!
bm_context_create() creates a BM context.
Parameter
Type
Description
ctx
Output
The pointer of BM context handle.

bm_context_destroy

1
void bm_context_destroy(
2
bmctx_t ctx)
Copied!
bm_context_destroy() destroys a BM context.
Parameter
Type
Description
ctx
Input
The BM context handle.

bm_bind_device

1
bmerr_t bm_bind_device(
2
bmctx_t ctx,
3
bmdev_t dev)
Copied!
bm_bind_device() binds a BM context with a BM device.
Parameter
Type
Description
ctx
Input
The BM context handle.
dev
Input
The BM device handle.

bm_unbind_device

1
void bm_unbind_device(
2
bmctx_t ctx)
Copied!
bm_unbind_device() unbinds a BM context with the BM device.
Parameter
Type
Description
ctx
Input
The BM context handle.

bm_get_device

1
bmdev_t bm_get_device(
2
bmctx_t ctx)
Copied!
bm_get_device() returns the BM device handle which is bound with the BM context.
Parameter
Type
Description
ctx
Input
The BM context handle.

bmruntime_bmkernel_create

1
bmerr_t bmruntime_bmkernel_create(
2
bmctx_t ctx,
3
bmkernel_handle_t **p_bk_ctx)
Copied!
bmruntime_bmkernel_create() creates a BM kernel with the BM context. The p_bk_ctx points to a thread local variable, so you can use this API to create multi BM contexts in multiple threads, they are independent. But you can’t own more than one BM context at the same time in one thread, otherwise there will be a memory leak.
Parameter
Type
Description
ctx
Input
The BM context handle.
p_bk_ctx
Output
The pointer of BM kernel handle.

bmruntime_bmkernel_submit

1
void bmruntime_bmkernel_destroy(
2
bmctx_t ctx)
Copied!
bmruntime_bmkernel_submit() the BM kernel with the BM context.
Parameter
Type
Description
ctx
Input
The BM context handle.

bmruntime_bmkernel_destroy

1
void bmruntime_bmkernel_destroy(
2
bmctx_t ctx)
Copied!
bmruntime_bmkernel_destroy() destroys the BM kernel with the BM context.
Parameter
Type
Description
ctx
Input
The BM context handle.

bmmem_device_alloc_raw

1
bmmem_device_t bmmem_device_alloc_raw(
2
bmctx_t ctx,
3
size_t size)
Copied!
bmmem_device_alloc_raw() allocates device memory as the input size.
Parameter
Type
Description
ctx
Input
The BM context handle.
size
Input
The size of device memory.

bmmem_device_prealloc_raw

1
bmmem_device_t bmmem_device_prealloc_raw(
2
bmctx_t ctx,
3
bmmem_device_t mem,
4
uint64_t offset,
5
size_t size)
Copied!
bmmem_device_prealloc_raw() allows application to allocate memory from previously allocted device memory. The memory you want to allocate needs to fall in the previously allocated device memory.
Parameter
Type
Description
ctx
Input
The BM context handle.
mem
Input
The previously allocated device memory.
offset
Input
The offset in the previously allocated device memory.
size
Input
The size of device memory

bmmem_device_alloc

1
bmmem_device_t bmmem_device_alloc(
2
bmctx_t ctx,
3
bmshape_t *shape)
Copied!
bmmem_device_alloc() allocates device memory as the input shape.
Parameter
Type
Description
ctx
Input
The BM context handle.
shape
Input
The shape of device memory.

bmmem_device_prealloc

1
bmmem_device_t bmmem_device_prealloc(
2
bmctx_t ctx,
3
bmmem_device_t mem,
4
uint64_t offset,
5
bmshape_t *shape)
Copied!
Parameter
Type
Description
ctx
Input
The BM context handle.
mem
Input
The previously allocated device memory.
Offset
Input
The offset in the previously allocated device memory.
shape
Input
The shape of device memory.

bmmem_device_free

1
void bmmem_device_free(
2
bmctx_t ctx,
3
bmmem_device_t mem)
Copied!
bmmem_device_free() frees the device memory that are allocated by the above allocating functions.
Parameter
Type
Description
ctx
Input
The BM context handle.
mem
Input
The device memory handle.

bmmem_host_alloc

1
bmmem_host_t bmmem_host_alloc(
2
bmctx_t ctx,
3
bmshape_t *shape)
Copied!
bmmem_host_alloc() always returns BM_ERR_NOT_SUPPORTED now.

bmmem_host_free

1
void bmmem_host_free(
2
bmctx_t ctx,
3
bmmem_host_t mem)
Copied!
bmmem_host_free() always returns BM_ERR_NOT_SUPPORTED now.

bmmem_device_size

1
size_t bmmem_device_size(
2
bmctx_t ctx,
3
bmmem_device_t mem)
Copied!
bmmem_device_size() returns the device memory size.
Parameter
Type
Description
ctx
Input
The BM context handle.
mem
Input
The device memory handle.

bmmem_device_addr

1
uint64_t bmmem_device_addr(
2
bmctx_t ctx,
3
bmmem_device_t mem)
Copied!
bmmem_device_addr() returns the device memory address.
Parameter
Type
Description
ctx
Input
The BM context handle.
mem
Input
The device memory handle.

bmmem_host_v_addr

1
void* bmmem_host_v_addr(
2
bmctx_t ctx,
3
bmmem_host_t mem)
Copied!
bmmem_host_v_addr() always returns BM_ERR_NOT_SUPPORTED now.

bmmem_host_p_addr

1
uint64_t bmmem_host_p_addr(
2
bmctx_t ctx,
3
bmmem_host_t mem)
Copied!
bmmem_host_p_addr() always returns BM_ERR_NOT_SUPPORTED now.

bm_memcpy_s2d

1
bmerr_t bm_memcpy_s2d(
2
bmctx_t ctx,
3
bmmem_device_t dst,
4
uint8_t* src)
Copied!
bm_memcpy_s2d() copy system memory data to device memory. s means system, d means device.
Parameter
Type
Description
ctx
Input
The BM context handle.
dst
Input
The device memory handle.
src
Input
The system memory pointer.

bm_memcpy_d2s

1
bmerr_t bm_memcpy_d2s(
2
bmctx_t ctx,
3
uint8_t* dst,
4
bmmem_device_t src)
Copied!
bm_memcpy_d2s copy device memory data to system memory.
Parameter
Type
Description
ctx
Input
The BM context handle.
dst
Input
The system memory pointer.
src
Input
The device memory handle.

bmnet_register

bmnet_register() registers a neuron network with bmnet info.
1
bmerr_t bmnet_register(
2
bmctx_t ctx,
3
bmnet_info_t *info,
4
bmnet_t *net)
Copied!
Parameter
Type
Description
ctx
Input
The BM context handle.
info
Input
The BM network info.
net
Output
The registered network handle.

bmnet_register_bmodel

1
bmerr_t bmnet_register_bmodel (
2
bmctx_t ctx,
3
char *bmodel,
4
bmnet_t *net)
Copied!
bmnet_register_bmodel() registers a neuron network with bmodel file.
Parameter
Type
Description
ctx
Input
The BM context handle.
bmodel
Input
bmodel filename.
net
Output
The registered network handle.

bmnet_register_noalloc

1
bmerr_t bmnet_register_noalloc(
2
bmctx_t ctx,
3
bmnet_info_t *info,
4
bmnet_t *net)
Copied!
bmnet_register_noalloc() registers a compiled neuron network without allocating weight and neuron device memory.
Parameter
Type
Description
ctx
Input
The BM context handle.
info
Input
The BM network info.
net
Output
The registered network handle.

bmnet_set_input_shape

1
bmerr_t bmnet_set_input_shape(
2
bmnet_t net,
3
shape_t input_shape)
Copied!
bmnet_set_input_shape () sets a input shape for a registered BM network. The bmodel support different input shapes, the API can set one of them.
Parameter
Type
Description
net
Input
The BM network handle.
input_shape
Input
The input shape.

bmnet_get_output_info

1
bmerr_t bmnet_get_output_info(
2
bmnet_t net,
3
bmnet_output_info_t *output_info)
Copied!
bmnet_get_output_info () sets a input shape for a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.
output_info
Output
The output info.

bmnet_cleanup

1
void bmnet_cleanup(
2
bmnet_t net)
Copied!
bmnet_cleanup() cleans up a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_run

1
bmerr_t bmnet_run(
2
bmnet_t net)
Copied!
bmnet_run() runs a registered BM network. You need load input and store output by yourself.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_weight_devmem

1
bmmem_device_t bmnet_weight_devmem(
2
bmnet_t net)
Copied!
bmnet_weight_devmem() retrieves the weight device memory handler from a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_neuron_devmem

1
bmmem_device_t bmnet_neuron_devmem(
2
bmnet_t net)
Copied!
bmnet_neuron_devmem() retrieves neuron device memory handler from a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_input_devmem

1
bmmem_device_t bmnet_input_devmem(
2
bmnet_t net)
Copied!
bmnet_input_devmem() retrieves input device memory handler from a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_output_devmem

1
bmmem_device_t bmnet_output_devmem(
2
bmnet_t net)
Copied!
bmnet_output_devmem() retrieves output device memory handler from a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.

bmnet_import_weight_devmem

1
bmerr_t bmnet_import_weight_devmem(
2
bmnet_t net,
3
bmmem_device_t weight_mem)
Copied!
bmnet_import_weight_devmem() imports weight device memory for a registered BM network. application should allocate weight device memory firstly, then call it to import weight memory. This function and bmnet_import_neuron_devmem() function are usually used with bmnet_register_noalloc() function. Application can register BM network without allocating weight and neuron device memory, and then use these two functions to import weight and neuron memory.
Parameter
Type
Description
net
Input
The BM network handle.
weight_mem
Input
The weight device memory handle.

bmnet_import_neuron_devmem

1
bmerr_t bmnet_import_neuron_devmem(
2
bmnet_t net,
3
bmmem_device_t neuron_mem)
Copied!
bmnet_import_neuron_devmem() imports neuron device memory for a registered BM network. Application should allocate neuron device memory firstly, then call it to import neuron memory.
Parameter
Type
Description
net
Input
The BM network handle.
neuron_mem
Input
The neuron device memory handle.

bmnet_load_input

1
bmerr_t bmnet_load_input(
2
bmnet_t net,
3
uint8_t *input)
Copied!
bmnet_load_input() loads input data for a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.
input
Input
The input data pointer.

bmnet_load_neuron

1
bmerr_t bmnet_load_neuron(
2
bmnet_t net,
3
uint64_t neuron_offset,
4
int neuron_size,
5
uint8_t *neuron)
Copied!
bmnet_load_neuron() loads neuron data for a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.
neuron_offset
Input
The offset of neuron buffer.
neuron_size
Input
The neuron buffer size.
neuron
Input
The pointer to the neuron buffer.

bmnet_store_output

1
bmerr_t bmnet_store_output (
2
bmnet_t net,
3
uint8_t *output)
Copied!
bmnet_store_output() stores output data for a registered BM network. Application uses this function to copy output data from device memory to host memory.
Parameter
Type
Description
net
Input
The BM network handle.
output
Input
The output buffer pointer.

bmnet_store_neuron

1
bmerr_t bmnet_store_neuron(
2
bmnet_t net,
3
uint64_t neuron_offset,
4
int neuron_size,
5
uint8_t *neuron)
Copied!
bmnet_store_neuron() stores neuron data for a registered BM network. Application uses this function to copy neuron data from device memory to host memory.
Parameter
Type
Description
net
Input
The BM network handle.
neuron_offset
Input
The offset of neuron buffer.
neuron_size
Input
The neuron buffer size.
neuron
Input
The pointer to the neuron buffer.

bmnet_inference

1
bmerr_t bmnet_inference(
2
bmnet_t net,
3
uint8_t *input,
4
uint8_t *output)
Copied!
bmnet_inference() runs inference with a registered BM network.
Parameter
Type
Description
net
Input
The BM network handle.
input
Input
The input buffer pointer.
output
Input
The output buffer pointer.

BMKernel Library

System API

bmk1880 register

User allocates a BMKernel context by filling a bmk1880 info t structure and passing it to bmk1880 register function. The function returns a handle of the initialized context.
In the bmk1880 info t structure: chip version is an integer describing the version of chip to work with, and can be 1880 or 1880; cmdbuf (short for “command buffer”) is a user-allocated buffer to contain generated hardware instructions and cmdbuf size describes its size in bytes. Note that user is responsible to free cmdbuf after the use of referring BMKernel context.
1
typedef struct { u32 chip_version; u8 *cmdbuf;
2
u32 cmdbuf_size;
3
} bmk1880_info_t;
4
void * bmk1880_register(bmk1880_info_t *info);
Copied!

bmk1880 cleanup

bmk1880 cleanup frees the context previously allocated by bmk1880 register.
1
void bmk1880_cleanup(void *ctx);
Copied!

bmk1880 acquire cmdbuf

bmk1880 acquire cmdbuf returns a buffer of hardware instructions generated so far and set (*size) to buffer’s valid size in bytes. The buffer is an array of cmd hdr t structures each containing one variable-sized generated hardware instruction.
1
u8 *bmk1880_acquire_cmdbuf(void *ctx, u32 *size);
2
typedef struct {
3
u8 engine_id : 4; ...
4
u8 len;
5
u8 cmd [0];
6
} cmd_hdr_t;
Copied!
In the cmd hdr t structure, engine id is the identifier of engine on which the contained in- struction is supposed to be executed. And len indicates in bytes the length of the hardware instruction immediately following this cmd hdr t structure.

bmk1880 reset

bmk1880 reset resets current BMKernel context to its initial state as returned by bmk1880 - register. This function is usually called after bmk1880 acquire cmdbuf to empty the cmdbuf buffer.
1
void bmk1880_reset(void *ctx);
Copied!

bmk1880 parallel enable

bmk1880 parallel enable claims that following computations on different engines can be executed with no synchornization with each other. This function enables engine-oriented parallel programming style.
1
void bmk1880_parallel_enable(void *ctx);
Copied!

bmk1880 parallel disable

bmk1880 parallel disable disables engine-oriented parallel programming style.
1
void bmk1880_parallel_disable(void *ctx);
Copied!

bmk1880 create streams

bmk1880 create streams creates nr streams streams, indexed 0 to (nr streams - 1), that following calls to bmk1880 set stream can refer to. This function enables dependency-oriented parallel programming style. Note this style can not be disabled once enabled.
1
void bmk1880_create_streams(void *ctx, int nr_streams);
Copied!

bmk1880 destroy streams

bmk1880 destroy streams destroys all the streams created by the previous call to bmk1880 - create streams and resets the system back to serial mode.
1
void bmk1880_destroy_streams(void *ctx);
Copied!

bmk1880 set stream

bmk1880 set stream set current stream to stream i that has been created by calling bmk1880 - create streams. Following computations will be put into this stream until another bmk1880 set - stream specifying a different stream index is called.
1
void bmk1880_set_stream(void *ctx, int i);
Copied!

bmk1880 add dependency

bmk1880 add dependency further restricts that the computation represented by before must take place strictly before that represented by after. Both before and after are pointers returned by some computation API.
1
void bmk1880_add_dependency(void *ctx, void *before, void *after);
Copied!

Computation API

During all kinds of computation, input values are first converted into 32-bit ones before any internal computation, and final 32-bit values are saturated into ranges that can be represented by the final 8-bit or 16-bit integer format. That is, if the value before saturation can be represented by the final integer format, it is unchanged. Otherwise it is saturated into the maximun or minimum in the final integer format, whichever is nearer to the original value. For example, if the final integer format is FMT_U8, then the representable maximum and minimum are 255 and 0 respectively. In this case, any value that is bigger than 255 becomes 255 after saturation, and values smaller than 0 are saturated into 0’s.
About signedness, one general rule applies to all kinds of computation when not otherwise specified: the result is unsigned if and only if all input tensors or matrice are unsigned. A tensor or matrix is said to be signed if it is of format FMT_I8, unsigned if FMT_U8.

fmt t

fmt t describes the type of basic data in a tensor or matrix. The naming consists of three parts. “FMT” is a fixed prefix. A following “I” or “U” stands for signed integer or unsigned integer respectively. “8” describes the bit-width of the type.
1
typedef u32 fmt_t;
2
#define FMT_I8 4
3
#define FMT_U8 9
Copied!

shape t

shape t describes the shape of a tensor or matrix. shape t4 and shape t2 are used to construct shape t’s for tensor and matrix, respectively.
1
typedef struct {
2
u32 dim;
3
u32 n;
4
u32 c;
5
union {
6
u32 h;
7
u32 row; };
8
union {
9
u32 w;
10
u32 col; };
11
} shape_t;
12
shape_t shape_t4(int n, int c,