SimpleNet Example {#ex_simplenet} ================================ This C++ API example demonstrates how to build an AlexNet neural network topology for forward-pass inference. Some key take-aways include: * How tensors implemented and submitted to primitives. * How primitives are created. * How primitives are sequentially submitted to the network, where the output from primitives is passed as input to the next primitive. The later specifies dependency between primitive input <-> output data. * Specific 'inference-only' configurations. * Limit the number of reorders performed which are decremental to performance. The simple_net.cpp example implements the AlexNet layers as numbered primitives (e.g. conv1, pool1, conv2). ## Highlights for implementing the simple_net.cpp Example: 1. Initialize a CPU engine. The last parameter in the engine() call represents the index of the engine. ~~~cpp using namespace mkldnn; auto cpu_engine = engine(engine::cpu, 0); ~~~ 2. Create a primitives vector that represents the net. ~~~cpp std::vector<primitive> net; ~~~ 3. Additionally, create a separate vector holding the weights. This will allow executing transformations only once and outside the topology stream. ~~~cpp std::vector<primitive> net_weights; ~~~ 4. Allocate a vector for input data and create the tensor to configure the dimensions. ~~~cpp memory::dims conv1_src_tz = { batch, 3, 227, 227 }; std::vector<float> user_src(batch * 3 * 227 * 227); /* similarly, specify tensor structure for output, weights and bias */ ~~~ 5. Create a memory primitive for data in user format as `nchw` (minibatch-channels-height-width). Create a memory descriptor for the convolution input, selecting `any` for the data format. The `any` format allows the convolution primitive to choose the data format that is most suitable for its input parameters (convolution kernel sizes, strides, padding, and so on). If the resulting format is different from `nchw`, the user data must be transformed to the format required for the convolution (as explained below). ~~~cpp auto user_src_memory = memory({ { { conv1_src_tz }, memory::data_type::f32, memory::format::nchw }, cpu_engine}, user_src.data()); auto conv1_src_md = memory::desc({conv1_src_tz}, memory::data_type::f32, memory::format::any); /* similarly create conv_weights_md and conv_dst_md in format::any */ ~~~ 6. Create a convolution descriptor by specifying the algorithm([convolution algorithms](@ref winograd_convolution), propagation kind, shapes of input, weights, bias, output, convolution strides, padding, and kind of padding. Propagation kind is set to *forward_inference* -optimized for inference execution and omits computations that are only necessary for backward propagation. */ ~~~cpp auto conv1_desc = convolution_forward::desc( prop_kind::forward_inference, algorithm::convolution_direct, conv1_src_md, conv1_weights_md, conv1_bias_md, conv1_dst_md, conv1_strides, conv1_padding, padding_kind::zero); ~~~ 7. Create a descriptor of the convolution primitive. Once created, this descriptor has specific formats instead of the `any` format specified in the convolution descriptor. ~~~cpp auto conv1_prim_desc = convolution_forward::primitive_desc(conv1_desc, cpu_engine); ~~~ 8. Create a convolution memory primitive from the user memory and check whether the user data format differs from the format that the convolution requires. In case it is different, create a reorder primitive that transforms the user data to the convolution format and add it to the net. Repeat this process for weights as well. ~~~cpp auto conv1_src_memory = user_src_memory; /* Check whether a reorder is necessary */ if (memory::primitive_desc(conv1_prim_desc.src_primitive_desc()) != user_src_memory.get_primitive_desc()) { /* Yes, a reorder is necessary */ /* The convolution primitive descriptor contains the descriptor of a memory * primitive it requires as input. Because a pointer to the allocated * memory is not specified, Intel MKL-DNN allocates the memory. */ conv1_src_memory = memory(conv1_prim_desc.src_primitive_desc()); /* create a reorder between user and convolution data and put the reorder * into the net. The conv1_src_memory will be the input for the convolution */ net.push_back(reorder(user_src_memory, conv1_src_memory)); } ~~~ 9. Create a memory primitive for output. ~~~cpp auto conv1_dst_memory = memory(conv1_prim_desc.dst_primitive_desc()); ~~~ 10. Create a convolution primitive and add it to the net. ~~~cpp /* Note that the conv_reorder_src primitive * is an input dependency for the convolution primitive, which means that the * convolution primitive will not be executed before the data is ready. */ net.push_bash(convolution_forward(conv1_prim_desc, conv1_src_memory, conv1_weights_memory, user_bias_memory, conv1_dst_memory)); ~~~ 11. Create relu primitive. For better performance keep ReLU (as well as for other operation primitives until another convolution or inner product is encountered) input data format in the same format as was chosen by convolution. Furthermore, ReLU is done in-place by using conv1 memory. ~~~cpp auto relu1_desc = eltwise_forward::desc(prop_kind::forward_inference, algorithm::eltwise_relu, conv1_dst_memory.get_primitive_desc().desc(), negative1_slope); auto relu1_prim_desc = eltwise_forward::primitive_desc(relu1_desc, cpu_engine); net.push_back(eltwise_forward(relu1_prim_desc, conv1_dst_memory, conv1_dst_memory)); ~~~ 12. For training execution, pooling requires a private workspace memory to perform the backward pass. However, pooling should not use 'workspace' for inference as this is decremental to performance. ~~~cpp /* create pooling indices memory from pooling primitive descriptor */ // auto pool1_indices_memory = memory(pool1_pd.workspace_primitive_desc()); auto pool1_dst_memory = memory(pool1_pd.dst_primitive_desc()); /* create pooling primitive an add it to net */ net.push_back(pooling_forward(pool1_pd, lrn1_dst_memory, pool1_dst_memory /* pool1_indices_memory */)); ~~~ The example continues to create more layers according to the AlexNet topology. 14. Finally, create a stream to execute weights data transformation. This is only required once. Create another stream that will exeute the 'net' primitives. For this example, the net is executed multiple times and each execution es timed individually. ~~~cpp /* Weight transformation - executed once */ stream(stream::kind::eager).submit(net_weights).wait(); /* Execute the topology */ mkldnn::stream(mkldnn::stream::kind::eager).submit(net).wait(); ~~~ --- [Legal information](@ref legal_information)