Create mkldnn primitives at first iteration for codegen - part2 (#2859)

* Create mkldnn primitives at first iteration for CODEGEN. OPs: add, lstm, and rnn. * OPs: batchnorm. * OPs: concat and lrn. Remove dead code. * Skip in place concat, relu, reshape, and slice when building node_primitive_string_deps_index map. * Change NGRAPH_ASSERT to NGRAPH_CHECK. * Address PR Feedback. * Create mkldnn primitives at first iteration for CODEGEN. OPs: convertlayout, relu, leakyrelu, boundedrelu, sigmoid, softmax, slice. * Fix bugs. * OPs: quantizedconcat. Check if there are descriptors before emitting code to read desc_file. * OPs: convolution backward. Use macro to write mkldnn memory dims to generated file. * OPs: MaxPoolWithIndices and MaxPoolWithIndicesBackprop. Add unit tests for MaxPoolWithIndices, MaxPoolWithIndicesBackprop, and MaxPoolBackprop. * Fix style error. * OPs: AvgPoolBackprop and MaxPoolBackprop. Add unit test for AvgPoolBackprop. * OPs: DeconvolutionBias. * OPs: Quantize and Dequantize. * OPs: QuantizedDot and QuantizedDotBias. * Use reference kernel for QuantizedConvolution for CODEGEN when mkldnn does not support the parameter types. Get scales for quantization ops in cpu_emitter. * Fix Windows build error: add CPU_BACKEND_API. * Use template for quantization ops. * OPs: QuantizedMatmul. Emit referece kernel for QuantizedDot in CODEGEN. * Remove QuantizedDot from get_scale_index. * Address PR feedback.

Create mkldnn primitives at first iteration for codegen - part2 (#2859)
* Create mkldnn primitives at first iteration for CODEGEN. OPs: add, lstm, and rnn. * OPs: batchnorm. * OPs: concat and lrn. Remove dead code. * Skip in place concat, relu, reshape, and slice when building node_primitive_string_deps_index map. * Change NGRAPH_ASSERT to NGRAPH_CHECK. * Address PR Feedback. * Create mkldnn primitives at first iteration for CODEGEN. OPs: convertlayout, relu, leakyrelu, boundedrelu, sigmoid, softmax, slice. * Fix bugs. * OPs: quantizedconcat. Check if there are descriptors before emitting code to read desc_file. * OPs: convolution backward. Use macro to write mkldnn memory dims to generated file. * OPs: MaxPoolWithIndices and MaxPoolWithIndicesBackprop. Add unit tests for MaxPoolWithIndices, MaxPoolWithIndicesBackprop, and MaxPoolBackprop. * Fix style error. * OPs: AvgPoolBackprop and MaxPoolBackprop. Add unit test for AvgPoolBackprop. * OPs: DeconvolutionBias. * OPs: Quantize and Dequantize. * OPs: QuantizedDot and QuantizedDotBias. * Use reference kernel for QuantizedConvolution for CODEGEN when mkldnn does not support the parameter types. Get scales for quantization ops in cpu_emitter. * Fix Windows build error: add CPU_BACKEND_API. * Use template for quantization ops. * OPs: QuantizedMatmul. Emit referece kernel for QuantizedDot in CODEGEN. * Remove QuantizedDot from get_scale_index. * Address PR feedback.
9335e41c · Amy Zhuang · Scott Cyphers · 30f3634e · 9335e41c · 9335e41c
Commit 9335e41c authored May 21, 2019 by Amy Zhuang Committed by Scott Cyphers May 21, 2019
11 changed files
--- a/src/ngraph/runtime/cpu/builder/convolution.cpp
+++ b/src/ngraph/runtime/cpu/builder/convolution.cpp
@@ -402,8 +402,8 @@ namespace ngraph
                        ngraph::op::ConvolutionBackpropData>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBackpropData>(node);
-                    // ConvolutionBackpropData needs 4 primitives: weights, delta, result,
-                    // and convolution_backward.
+                    // ConvolutionBackpropData needs 4 primitives: weights, diff_dst, diff_src,
+                    // and convolution_backward_data.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(4);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);

@@ -502,7 +502,7 @@ namespace ngraph
                        ngraph::op::ConvolutionBackpropFilters>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBackpropFilters>(node);
-                    // ConvolutionBackpropFilter needs 4 primitives: input, delta, weights_delta,
+                    // ConvolutionBackpropFilter needs 4 primitives: src, diff_dst, diff_weights,
                    // and convolution_backward_weights.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(4);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
@@ -598,8 +598,8 @@ namespace ngraph
                        ngraph::op::ConvolutionBiasBackpropFiltersBias>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBiasBackpropFiltersBias>(node);
-                    // ConvolutionBiasBackpropFilter needs 5 primitives: input, delta, weights_delta,
-                    // bias_delta, and convolution_backward_weights.
+                    // ConvolutionBackpropFiltersBias needs 5 primitives: src, diff_dst, diff_weights,
+                    // diff_bias, and convolution_backward_weights.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(5);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);


--- a/src/ngraph/runtime/cpu/builder/max_pool.cpp
+++ b/src/ngraph/runtime/cpu/builder/max_pool.cpp
@@ -301,7 +301,7 @@ namespace ngraph
                        ->get_max_pooling_backward_desc<ngraph::op::MaxPoolWithIndicesBackprop>(
                            node);
                // MaxPoolWithIndicesBackprop needs 4 primitives: diff_dst, fprop_workspace,
-                // diff_dst, and pooling_backward.
+                // diff_src, and pooling_backward.
                size_t max_pool_index = mkldnn_emitter->reserve_primitive_space(4);
                auto& deps = mkldnn_emitter->get_primitive_deps(max_pool_index);


--- a/src/ngraph/runtime/cpu/cpu_emitter.cpp
+++ b/src/ngraph/runtime/cpu/cpu_emitter.cpp
@@ -174,6 +174,59 @@ namespace ngraph
                index = get<2>(external_function->get_primitive_build_tuple(node));
            }

+            template <typename OP>
+            static void emit_build_primitives(CPU_ExternalFunction* external_function,
+                                              const ngraph::Node* node,
+                                              CodeWriter& writer,
+                                              size_t& index,
+                                              std::vector<std::size_t>& deps,
+                                              const std::vector<TensorViewWrapper>& args)
+            {
+                writer << "if (ctx->first_iteration)\n";
+                writer.block_begin();
+
+                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
+                auto scale_index = mkldnn_emitter->get_scale_index<OP>();
+                auto scales_size = shape_size(node->get_input_shape(scale_index));
+                writer << "std::vector<float> dyn_scales;\n";
+                writer << "dyn_scales.assign(" << args[scale_index].get_name() << ", "
+                       << args[scale_index].get_name() << " + " << std::to_string(scales_size)
+                       << ");\n";
+
+                // for Quantize
+                if (is_same<OP, ngraph::op::Quantize>())
+                {
+                    writer << "for (size_t i = 0; i < " << std::to_string(scales_size)
+                           << "; i++)\n";
+                    writer.block_begin();
+                    writer << "dyn_scales[i] = 1.0 / dyn_scales[i];\n";
+                    writer.block_end();
+                }
+
+                // QuantizedConvolutionBiasAdd and QuantizedConvolutionBiasSignedAdd
+                if (is_same<OP, ngraph::op::QuantizedConvolutionBiasAdd>() ||
+                    is_same<OP, ngraph::op::QuantizedConvolutionBiasSignedAdd>())
+                {
+                    auto sum_scale_index = 5;
+                    auto sum_scales_size = shape_size(node->get_input_shape(sum_scale_index));
+                    writer << "std::vector<float> dyn_post_op_scales;\n";
+                    writer << "dyn_post_op_scales.assign(" << args[sum_scale_index].get_name()
+                           << ", " << args[sum_scale_index].get_name() << " + "
+                           << std::to_string(sum_scales_size) << ");\n";
+                }
+
+                writer << "// quantize across first dim (mask=2^0) if dyn_scales is a "
+                          "vector \n";
+                writer << "const int mask = " << std::to_string(scales_size) << " == 1 ? 0 : 1;\n";
+
+                // get the string, deps, and index from the map
+                writer << get<0>(external_function->get_primitive_build_tuple(node));
+                writer.block_end();
+
+                deps = get<1>(external_function->get_primitive_build_tuple(node));
+                index = get<2>(external_function->get_primitive_build_tuple(node));
+            }
+
            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::Add)
            {
@@ -1434,22 +1487,18 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto out_shape = out[0].get_shape();
-
-                    auto lower_bounds = slice->get_lower_bounds();
-
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto slice_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(slice_index);
+                    size_t slice_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, slice_index, deps);

                    writer.block_begin();
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(slice_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(slice_index)
+                           << ");\n";
                    writer.block_end();
                    return;
                }
@@ -1553,12 +1602,24 @@ namespace ngraph
                auto index_type_name = embed->get_argument(0)->get_element_type().c_type_string();
                auto type_name = embed->get_element_type().c_type_string();
                auto element_count = shape_size(embed->get_argument(0)->get_shape());
+
+                // FIXME
+                // clang generates 16 bytes aligned store with unaligned address,
+                // which results in segmentation fault.
+                // Workaround for now: Use push_back to avoid generating such store.
+                auto arg1_shape = args[1].get_shape();
+                writer << "ngraph::Shape shape;\n";
+                for (auto i = 0; i < arg1_shape.size(); i++)
+                {
+                    writer << "shape.push_back(" << std::to_string(arg1_shape[i]) << ");\n";
+                }
+
                writer << "reference::embedding<" << type_name << "," << index_type_name << ">(";
                writer << "            " << args[0].get_name() << ",\n";
                writer << "            " << args[1].get_name() << ",\n";
                writer << "            " << out[0].get_name() << ",\n";
                writer << "            " << element_count << ",\n";
-                writer << "            {" << join(args[1].get_shape()) << "});\n";
+                writer << "             shape);\n";
                writer.block_end();
            }

@@ -2056,7 +2117,8 @@ namespace ngraph
                {
                    size_t conv_index;
                    std::vector<std::size_t> deps;
-                    emit_build_primitives(external_function, node, writer, conv_index, deps);
+                    emit_build_primitives<ngraph::op::QuantizedConvolutionRelu>(
+                        external_function, node, writer, conv_index, deps, args);

                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
                           << args[0].get_name() << ");\n";
@@ -2079,7 +2141,8 @@ namespace ngraph
                {
                    size_t conv_index;
                    std::vector<std::size_t> deps;
-                    emit_build_primitives(external_function, node, writer, conv_index, deps);
+                    emit_build_primitives<ngraph::op::QuantizedConvolution>(
+                        external_function, node, writer, conv_index, deps, args);

                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
                           << args[0].get_name() << ");\n";
@@ -2091,7 +2154,35 @@ namespace ngraph
                }
                else
                {
-                    throw ngraph_error("unsupported parameters for QuantizedConvolution");
+                    auto convolution = static_cast<const ngraph::op::QuantizedConvolution*>(node);
+
+                    auto arg0_shape = args[0].get_shape();
+                    auto arg1_shape = args[1].get_shape();
+                    auto result_shape = out[0].get_shape();
+
+                    auto scales_size = shape_size(node->get_input_shape(2));
+                    writer << "std::vector<float> dyn_scales;\n";
+                    writer << "dyn_scales.assign(" << args[2].get_name() << ", "
+                           << args[2].get_name() << " + " << std::to_string(scales_size) << ");\n";
+
+                    writer << "reference::convolution<" << out[0].get_type() << ">("
+                           << args[0].get_name() << ",\n";
+                    writer << "                         " << args[1].get_name() << ",\n";
+                    writer << "                         " << out[0].get_name() << ",\n";
+                    writer << "                         {" << join(arg0_shape) << "},\n";
+                    writer << "                         {" << join(arg1_shape) << "},\n";
+                    writer << "                         {" << join(result_shape) << "},\n";
+                    writer << "                         {"
+                           << join(convolution->get_window_movement_strides()) << "},\n";
+                    writer << "                         {"
+                           << join(convolution->get_window_dilation_strides()) << "},\n";
+                    writer << "                         {" << join(convolution->get_padding_below())
+                           << "},\n";
+                    writer << "                         {" << join(convolution->get_padding_above())
+                           << "},\n";
+                    writer << "                         {"
+                           << join(convolution->get_data_dilation_strides()) << "}, \n";
+                    writer << "                         dyn_scales[0]);\n";
                }
            }

@@ -2201,19 +2292,18 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto conv_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
+                    size_t conv_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, conv_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(conv_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(conv_index) << ");\n";
                }
                else
                {
@@ -2247,23 +2337,20 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto conv_index =
-                        mkldnn_emitter->build_deconvolution<ngraph::op::DeconvolutionBias>(
-                            node, args, out);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
+                    size_t conv_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, conv_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << args[2].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[3])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << args[2].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[3]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(conv_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(conv_index) << ");\n";
                }
                else
                {
@@ -2282,19 +2369,18 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto conv_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
+                    size_t conv_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, conv_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(conv_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(conv_index) << ");\n";
                }
                else
                {
@@ -2326,7 +2412,8 @@ namespace ngraph
                {
                    size_t conv_index;
                    std::vector<std::size_t> deps;
-                    emit_build_primitives(external_function, node, writer, conv_index, deps);
+                    emit_build_primitives<ngraph::op::QuantizedConvolutionBias>(
+                        external_function, node, writer, conv_index, deps, args);

                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
                           << args[0].get_name() << ");\n";
@@ -2352,7 +2439,8 @@ namespace ngraph
                {
                    size_t conv_index;
                    std::vector<std::size_t> deps;
-                    emit_build_primitives(external_function, node, writer, conv_index, deps);
+                    emit_build_primitives<ngraph::op::QuantizedConvolutionBiasAdd>(
+                        external_function, node, writer, conv_index, deps, args);

                    writer << "if (" << out[0].get_name() << " != " << args[3].get_name() << ")\n";
                    writer.block_begin();
@@ -2383,7 +2471,8 @@ namespace ngraph
                {
                    size_t conv_index;
                    std::vector<std::size_t> deps;
-                    emit_build_primitives(external_function, node, writer, conv_index, deps);
+                    emit_build_primitives<ngraph::op::QuantizedConvolutionBiasSignedAdd>(
+                        external_function, node, writer, conv_index, deps, args);

                    writer << "if (" << out[0].get_name() << " != " << args[3].get_name() << ")\n";
                    writer.block_begin();
@@ -2412,23 +2501,21 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto qip_index =
-                        mkldnn_emitter->build_inner_product<ngraph::op::QuantizedDotBias>(
-                            node, args, out);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(qip_index);
+                    size_t qip_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives<ngraph::op::QuantizedDotBias>(
+                        external_function, node, writer, qip_index, deps, args);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << args[2].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[3])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[3]) << ", "
+                           << args[2].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(qip_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(qip_index) << ");\n";
                }
                else
                {
@@ -2438,33 +2525,46 @@ namespace ngraph

            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::QuantizedDot)
+            {
+                if (shape_size(args[2].get_shape()) != 1)
+                {
+                    throw ngraph_error("Scale size should be 1 for QuantizedDot");
+                }
+                writer << "float dyn_scale = *(static_cast<float*>(" << args[2].get_name()
+                       << "));\n";
+
+                writer << "reference::dot(" << args[0].get_name() << ",\n";
+                writer << "            " << args[1].get_name() << ",\n";
+                writer << "            " << out[0].get_name() << ",\n";
+                writer << "            {" << join(args[0].get_shape()) << "},\n";
+                writer << "            {" << join(args[1].get_shape()) << "},\n";
+                writer << "            {" << join(out[0].get_shape()) << "},\n";
+                writer << "            1,\n";
+                writer << "            dyn_scale);\n";
+            }
+
+            template <>
+            void CPU_Emitter::EMITTER_DECL(ngraph::op::QuantizedMatmul)
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    if (node->get_input_element_type(0) == element::u8 &&
-                        node->get_input_element_type(1) == element::u8)
-                    {
-                        throw ngraph_error(
-                            "Unsupported data types for QuantizedDot MKLDNN kernel.");
-                    }
+                    size_t qip_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives<ngraph::op::QuantizedMatmul>(
+                        external_function, node, writer, qip_index, deps, args);

-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto qip_index = mkldnn_emitter->build_inner_product<ngraph::op::QuantizedDot>(
-                        node, args, out);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(qip_index);
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(qip_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(qip_index) << ");\n";
                }
                else
                {
-                    throw ngraph_error("unsupported parameters for QuantizedDot");
+                    throw ngraph_error("QuantizedMatmul is only supported with MKLDNN kernel.");
                }
            }

@@ -2556,21 +2656,20 @@ namespace ngraph
            {
                if (mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    auto conv_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
+                    size_t conv_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, conv_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[3])
-                           << ", " << out[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[3]) << ", "
+                           << out[1].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(conv_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(conv_index) << ");\n";
                }
                else
                {
@@ -2672,19 +2771,19 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t max_pool_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(max_pool_index);
+                    size_t max_pool_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, max_pool_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[1].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(max_pool_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(max_pool_index)
+                           << ");\n";
                }
                else
                {
@@ -2883,17 +2982,17 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t avg_pool_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(avg_pool_index);
+                    size_t avg_pool_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, avg_pool_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(avg_pool_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(avg_pool_index)
+                           << ");\n";
                }
                else
                {
@@ -2925,29 +3024,27 @@ namespace ngraph

                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t max_pool_index = external_function->get_primitive_index(node);
-                    auto& fdeps = mkldnn_emitter->get_primitive_deps(max_pool_index - 1);
-
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(fdeps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(fdeps[1])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(fdeps[2])
-                           << ", ctx->mkldnn_workspaces[" << fdeps[3] << "]);\n";
-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(max_pool_index - 1) << ");\n";
-
-                    auto& bdeps = mkldnn_emitter->get_primitive_deps(max_pool_index);
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[0])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[1])
-                           << ", ctx->mkldnn_workspaces[" << bdeps[3] << "]);\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[2])
-                           << ", " << out[0].get_name() << ");\n";
-
-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(max_pool_index) << ");\n";
+                    size_t max_pool_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, max_pool_index, deps);
+
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[3])
+                           << ", cg_ctx->mkldnn_workspaces[" << deps[5] << "]);\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(deps[4]) << ");\n";
+
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[3])
+                           << ", cg_ctx->mkldnn_workspaces[" << deps[5] << "]);\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";
+
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(max_pool_index)
+                           << ");\n";
                }
                else
                {
@@ -2971,19 +3068,19 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t max_pool_index = external_function->get_primitive_index(node);
-                    auto& bdeps = mkldnn_emitter->get_primitive_deps(max_pool_index);
+                    size_t max_pool_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, max_pool_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[0])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[1])
-                           << ", " << args[2].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(bdeps[2])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << args[2].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(max_pool_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(max_pool_index)
+                           << ");\n";
                }
                else
                {
@@ -3076,55 +3173,24 @@ namespace ngraph
            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::runtime::cpu::op::ConvertLayout)
            {
-                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
+                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
+                {
+                    size_t reorder_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, reorder_index, deps);

-                auto input_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
-                auto result_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                //this is a special case to handle nchw(oihw) to goihw/Goihw16g/Goihw8g for GroupConvolution's weights
-                if (input_desc.data.format == mkldnn_nchw &&
-                    result_desc.data.format == mkldnn_goihw)
-                {
-                    input_desc = result_desc;
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(reorder_index)
+                           << ");\n";
                }
-                else if (input_desc.data.format == mkldnn_nchw &&
-                         input_desc.data.ndims == 4 /*nchw*/ &&
-                         result_desc.data.ndims == 5 /*Goihw16g/Goihw8g/etc*/ &&
-                         node->get_users().size() == 1)
+                else
                {
-                    Shape weights_shape_groups;
-                    if (auto gconv = std::dynamic_pointer_cast<ngraph::op::GroupConvolution>(
-                            node->get_users()[0]))
-                    {
-                        weights_shape_groups = gconv->get_weights_dimensions();
-                    }
-                    else if (auto gconvb =
-                                 std::dynamic_pointer_cast<ngraph::op::GroupConvolutionBias>(
-                                     node->get_users()[0]))
-                    {
-                        weights_shape_groups = gconvb->get_weights_dimensions();
-                    }
-                    else
-                    {
-                        throw ngraph_error("Incompatible input/output shape in ConvertLayout op");
-                    }
-                    input_desc = mkldnn::memory::desc(
-                        mkldnn::memory::dims(weights_shape_groups.begin(),
-                                             weights_shape_groups.end()),
-                        mkldnn_utils::get_mkldnn_data_type(args[0].get_element_type()),
-                        mkldnn::memory::format::goihw);
+                    throw ngraph_error("ConvertLayout is only supported with MKLDNN kernel.");
                }
-
-                size_t reorder_index = external_function->get_primitive_index(node);
-                auto& deps = mkldnn_emitter->get_primitive_deps(reorder_index);
-
-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0]) << ", "
-                       << args[0].get_name() << ");\n";
-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1]) << ", "
-                       << out[0].get_name() << ");\n";
-
-                writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                       << to_string(reorder_index) << ");\n";
            }

            template <>
@@ -3132,19 +3198,18 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t relu_index = external_function->get_primitive_index(node);
+                    size_t relu_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, relu_index, deps);

-                    auto& deps = mkldnn_emitter->get_primitive_deps(relu_index);
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << args[1].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(relu_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(relu_index) << ");\n";
                }
                else
                {
@@ -3162,17 +3227,16 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t relu_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(relu_index);
+                    size_t relu_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, relu_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(relu_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(relu_index) << ");\n";
                }
                else
                {
@@ -3188,23 +3252,24 @@ namespace ngraph
            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::LeakyRelu)
            {
-                auto leaky_relu_node = static_cast<const ngraph::op::LeakyRelu*>(node);
-                float alpha = leaky_relu_node->get_alpha();
-                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto leaky_relu_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(leaky_relu_index);
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    size_t leaky_relu_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, leaky_relu_index, deps);
+
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(leaky_relu_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(leaky_relu_index)
+                           << ");\n";
                }
                else
                {
+                    auto leaky_relu_node = static_cast<const ngraph::op::LeakyRelu*>(node);
+                    float alpha = leaky_relu_node->get_alpha();
                    writer << "#pragma omp parallel for\n";
                    writer << "for (size_t i = 0; i < " << out[0].get_size() << "; i++)\n";
                    writer.block_begin();
@@ -3218,23 +3283,25 @@ namespace ngraph
            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::BoundedRelu)
            {
-                auto bounded_relu_node = static_cast<const ngraph::op::BoundedRelu*>(node);
-                float alpha = bounded_relu_node->get_alpha();
-                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto bounded_relu_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(bounded_relu_index);
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    size_t bounded_relu_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(
+                        external_function, node, writer, bounded_relu_index, deps);
+
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(bounded_relu_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(bounded_relu_index)
+                           << ");\n";
                }
                else
                {
+                    auto bounded_relu_node = static_cast<const ngraph::op::BoundedRelu*>(node);
+                    float alpha = bounded_relu_node->get_alpha();
                    writer << "#pragma omp parallel for\n";
                    writer << "for (size_t i = 0; i < " << out[0].get_size() << "; i++)\n";
                    writer.block_begin();
@@ -3249,42 +3316,49 @@ namespace ngraph
            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::Sigmoid)
            {
-                auto input_shape = args[0].get_shape();
-                auto result_shape = out[0].get_shape();
-
-                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                size_t sigmoid_index = external_function->get_primitive_index(node);
-                auto& deps = mkldnn_emitter->get_primitive_deps(sigmoid_index);
+                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
+                {
+                    size_t sigmoid_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, sigmoid_index, deps);

-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0]) << ", "
-                       << args[0].get_name() << ");\n";
-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1]) << ", "
-                       << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                       << to_string(sigmoid_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(sigmoid_index)
+                           << ");\n";
+                }
+                else
+                {
+                    throw ngraph_error("Sigmoid is only supported with MKLDNN kernel.");
+                }
            }

            template <>
            void CPU_Emitter::EMITTER_DECL(ngraph::op::SigmoidBackprop)
            {
-                auto input_shape = args[0].get_shape();
-                auto delta_shape = args[1].get_shape();
-                auto result_shape = out[0].get_shape();
-
-                auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                size_t sigmoid_index = external_function->get_primitive_index(node);
-                auto& deps = mkldnn_emitter->get_primitive_deps(sigmoid_index);
+                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
+                {
+                    size_t sigmoid_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, sigmoid_index, deps);

-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0]) << ", "
-                       << args[0].get_name() << ");\n";
-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1]) << ", "
-                       << args[1].get_name() << ");\n";
-                writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[2]) << ", "
-                       << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << args[1].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[2]) << ", "
+                           << out[0].get_name() << ");\n";

-                writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                       << to_string(sigmoid_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(sigmoid_index)
+                           << ");\n";
+                }
+                else
+                {
+                    throw ngraph_error("SigmoidBackprop is only supported with MKLDNN kernel.");
+                }
            }

            std::string
@@ -3445,17 +3519,17 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t softmax_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(softmax_index);
+                    size_t softmax_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, softmax_index, deps);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(softmax_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(softmax_index)
+                           << ");\n";
                }
                else
                {
@@ -3854,16 +3928,17 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t dequantize_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(dequantize_index);
+                    size_t dequantize_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives<ngraph::op::Dequantize>(
+                        external_function, node, writer, dequantize_index, deps, args);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(dequantize_index) << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(dequantize_index)
+                           << ");\n";
                }
                else
                {
@@ -3885,16 +3960,17 @@ namespace ngraph
                auto quantize = static_cast<const ngraph::op::Quantize*>(node);
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t quantize_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(quantize_index);
+                    size_t quantize_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives<ngraph::op::Quantize>(
+                        external_function, node, writer, quantize_index, deps, args);

-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[0])
-                           << ", " << args[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[1])
-                           << ", " << out[0].get_name() << ");\n";
-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(quantize_index) << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[0]) << ", "
+                           << args[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[1]) << ", "
+                           << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(quantize_index)
+                           << ");\n";
                }
                else
                {
@@ -3916,21 +3992,21 @@ namespace ngraph
            {
                if (runtime::cpu::mkldnn_utils::use_mkldnn_kernel(node))
                {
-                    auto& mkldnn_emitter = external_function->get_mkldnn_emitter();
-                    size_t concat_index = external_function->get_primitive_index(node);
-                    auto& deps = mkldnn_emitter->get_primitive_deps(concat_index);
+                    size_t concat_index;
+                    std::vector<std::size_t> deps;
+                    emit_build_primitives(external_function, node, writer, concat_index, deps);

                    size_t i;
                    for (i = 0; i < args.size(); i++)
                    {
-                        writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[i])
-                               << ", " << args[i].get_name() << ");\n";
+                        writer << "cg_ctx->set_memory_ptr(" << to_string(deps[i]) << ", "
+                               << args[i].get_name() << ");\n";
                    }
-                    writer << "cpu::mkldnn_utils::set_memory_ptr(ctx, " << to_string(deps[i])
-                           << ", " << out[0].get_name() << ");\n";
+                    writer << "cg_ctx->set_memory_ptr(" << to_string(deps[i]) << ", "
+                           << out[0].get_name() << ");\n";

-                    writer << "cpu::mkldnn_utils::mkldnn_invoke_primitive(ctx, "
-                           << to_string(concat_index) << ");\n";
+                    writer << "cg_ctx->mkldnn_invoke_primitive(" << to_string(concat_index)
+                           << ");\n";
                }
                else
                {

--- a/src/ngraph/runtime/cpu/cpu_external_function.cpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.cpp
@@ -480,10 +480,7 @@ void runtime::cpu::CPU_ExternalFunction::compile(ngraph::pass::PassConfig& pass_

    // Build mkldnn primitives for codegen.
    pass_manager.register_pass<runtime::cpu::pass::MKLDNNPrimitiveBuildPass>(
-        m_desc_filename,
-        *m_mkldnn_emitter,
-        m_node_primitive_idx_map,
-        m_node_primitive_string_deps_index_map);
+        m_desc_filename, *m_mkldnn_emitter, m_node_primitive_string_deps_index_map);

    unordered_map<Node*, Node*> node_function_map;
    string common_function_string;
@@ -746,16 +743,20 @@ using namespace ngraph::runtime;
        writer << "extern \"C\" void " << current_function->get_name() << func_params << "\n";
        writer << "{\n";
        writer.indent++;
-        writer << "std::ifstream desc_file (\"" << m_desc_filename << "\", std::ios::binary);\n";

        //deserialize and build mkldnn primitives
-        writer << "if (ctx->first_iteration)\n";
-        writer.block_begin();
-        writer << "// read in memory descriptors and build mkldnn primitives\n";
-        writer << "deserialize_memory_descs_and_build_memory_primitives(" << m_desc_filename
-               << ", cg_ctx, " << to_string(m_mkldnn_emitter->get_mkldnn_descriptors_size())
-               << ");\n";
-        writer.block_end();
+        if (m_mkldnn_emitter->get_mkldnn_descriptors_size() > 0)
+        {
+            writer << "if (ctx->first_iteration)\n";
+            writer.block_begin();
+            writer << "// read in memory descriptors and build mkldnn primitives\n";
+            writer << "std::ifstream desc_file (\"" << m_desc_filename
+                   << "\", std::ios::binary);\n";
+            writer << "deserialize_memory_descs_and_build_memory_primitives(" << m_desc_filename
+                   << ", cg_ctx, " << to_string(m_mkldnn_emitter->get_mkldnn_descriptors_size())
+                   << ");\n";
+            writer.block_end();
+        }

        // Execution tracing support
        if (runtime::cpu::IsTracingEnabled() && current_function->get_name() == m_function_name)

--- a/src/ngraph/runtime/cpu/cpu_external_function.hpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.hpp
@@ -114,17 +114,6 @@ namespace ngraph
                    return m_mkldnn_emitter;
                }

-                /// Returns the index of the mkldnn primitive previously created for \p node.
-                size_t get_primitive_index(const Node* node) const
-                {
-                    auto it = m_node_primitive_idx_map.find(node);
-                    NGRAPH_CHECK(it != m_node_primitive_idx_map.end(),
-                                 "Primitive not found for node ",
-                                 node->description());
-
-                    return it->second;
-                }
-
                // Return the tuple including the string to create mkldnn primitive, the deps and the index in CODEGEN
                const std::tuple<std::string, std::vector<size_t>, size_t>&
                    get_primitive_build_tuple(const Node* node) const
@@ -328,8 +317,6 @@ namespace ngraph
                std::unordered_map<std::string, size_t> subgraph_param_indices;
 #endif

-                /// Map each node with mkldnn implementation to its mkldnn primitive index.
-                std::unordered_map<const Node*, size_t> m_node_primitive_idx_map;
                /// Map each node with mkldnn implementation to its mkldnn primitive creating string, deps, and mkldnn primitive index.
                std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>
                    m_node_primitive_string_deps_index_map;

--- a/src/ngraph/runtime/cpu/mkldnn_emitter.cpp
+++ b/src/ngraph/runtime/cpu/mkldnn_emitter.cpp
@@ -24,19 +24,15 @@
 #include "ngraph/op/batch_norm.hpp"
 #include "ngraph/op/concat.hpp"
 #include "ngraph/op/constant.hpp"
-#include "ngraph/op/constant.hpp"
 #include "ngraph/op/convert.hpp"
 #include "ngraph/op/convolution.hpp"
 #include "ngraph/op/dequantize.hpp"
-#include "ngraph/op/dequantize.hpp"
-#include "ngraph/op/experimental/quantized_avg_pool.hpp"
 #include "ngraph/op/experimental/quantized_avg_pool.hpp"
 #include "ngraph/op/experimental/quantized_concat.hpp"
 #include "ngraph/op/experimental/quantized_conv.hpp"
 #include "ngraph/op/experimental/quantized_conv_bias.hpp"
 #include "ngraph/op/experimental/quantized_conv_relu.hpp"
 #include "ngraph/op/experimental/quantized_max_pool.hpp"
-#include "ngraph/op/experimental/quantized_max_pool.hpp"
 #include "ngraph/op/get_output_element.hpp"
 #include "ngraph/op/lrn.hpp"
 #include "ngraph/op/max_pool.hpp"
@@ -342,119 +338,6 @@ void MKLDNNEmitter::build_deconvolutionbias_forward(
    }
 }

-size_t MKLDNNEmitter::build_deconvolutionbias_forward(const mkldnn::memory::desc& input_data_desc,
-                                                      const mkldnn::memory::desc& weights_desc,
-                                                      const mkldnn::memory::desc& bias_desc,
-                                                      const mkldnn::memory::desc& result_desc,
-                                                      const ngraph::Strides& strides,
-                                                      const ngraph::Strides& dilation_strides,
-                                                      const ngraph::CoordinateDiff& padding_below,
-                                                      const ngraph::CoordinateDiff& padding_above,
-                                                      const mkldnn::post_ops& pops)
-{
-    size_t input_data_index = build_memory_primitive(input_data_desc);
-    size_t weights_index = build_memory_primitive(weights_desc);
-    size_t bias_index = build_memory_primitive(bias_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    mkldnn::primitive_attr conv_attr;
-    conv_attr.set_post_ops(pops);
-
-    size_t conv_index = 0;
-    try
-    {
-        auto conv_prim = new mkldnn::deconvolution_forward(
-            {{mkldnn::prop_kind::forward,
-              mkldnn::algorithm::deconvolution_direct,
-              input_data_desc,
-              weights_desc,
-              bias_desc,
-              result_desc,
-              mkldnn::memory::dims(strides.begin(), strides.end()),
-              mkldnn::memory::dims(dilation_strides.begin(), dilation_strides.end()),
-              mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-              mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-              mkldnn::padding_kind::zero},
-
-             conv_attr,
-             executor::global_cpu_engine},
-            *m_mkldnn_primitives[input_data_index],
-            *m_mkldnn_primitives[weights_index],
-            *m_mkldnn_primitives[bias_index],
-            *m_mkldnn_primitives[result_index]);
-
-        conv_index = insert_primitive(conv_prim);
-
-        m_primitive_deps[conv_index] = {weights_index, input_data_index, bias_index, result_index};
-    }
-    catch (const mkldnn::error& e)
-    {
-        throw ngraph_error("Could not create mkldnn deconvolution_forward " + e.message);
-    }
-    return conv_index;
-}
-
-size_t MKLDNNEmitter::build_convolution_backward_weights_bias(
-    const mkldnn::memory::desc& in_data_desc,
-    const mkldnn::memory::desc& in_delta_desc,
-    const mkldnn::memory::desc& out_weights_delta_desc,
-    const mkldnn::memory::desc& out_bias_delta_desc,
-    const ngraph::Strides& ng_strides,
-    const ngraph::Strides& ng_dilation_strides,
-    const ngraph::CoordinateDiff& ng_padding_below,
-    const ngraph::CoordinateDiff& ng_padding_above)
-{
-    const size_t in_data_index = build_memory_primitive(in_data_desc);
-    const size_t in_delta_index = build_memory_primitive(in_delta_desc);
-    const size_t out_weights_delta_index = build_memory_primitive(out_weights_delta_desc);
-    const size_t out_bias_delta_index = build_memory_primitive(out_bias_delta_desc);
-
-    mkldnn::memory::dims strides(ng_strides.begin(), ng_strides.end());
-    mkldnn::memory::dims dilation(ng_dilation_strides.begin(), ng_dilation_strides.end());
-    mkldnn::memory::dims padding_l(ng_padding_below.begin(), ng_padding_below.end());
-    mkldnn::memory::dims padding_r(ng_padding_above.begin(), ng_padding_above.end());
-    mkldnn::algorithm convolution_algo = mkldnn_utils::get_conv_algo();
-    mkldnn::convolution_forward::primitive_desc fwd_pd{{mkldnn::prop_kind::forward,
-                                                        convolution_algo,
-                                                        in_data_desc,
-                                                        out_weights_delta_desc,
-                                                        out_bias_delta_desc,
-                                                        in_delta_desc,
-                                                        strides,
-                                                        dilation,
-                                                        padding_l,
-                                                        padding_r,
-                                                        mkldnn::padding_kind::zero},
-                                                       executor::global_cpu_engine};
-
-    mkldnn::convolution_backward_weights::primitive_desc bwd_pd{{convolution_algo,
-                                                                 in_data_desc,
-                                                                 out_weights_delta_desc,
-                                                                 out_bias_delta_desc,
-                                                                 in_delta_desc,
-                                                                 strides,
-                                                                 dilation,
-                                                                 padding_l,
-                                                                 padding_r,
-                                                                 mkldnn::padding_kind::zero},
-                                                                executor::global_cpu_engine,
-                                                                fwd_pd};
-
-    const size_t conv_index = insert_primitive(
-        new mkldnn::convolution_backward_weights(bwd_pd,
-                                                 *m_mkldnn_primitives[in_data_index],
-                                                 *m_mkldnn_primitives[in_delta_index],
-                                                 *m_mkldnn_primitives[out_weights_delta_index],
-                                                 *m_mkldnn_primitives[out_bias_delta_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(conv_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[conv_index] = {
-        in_data_index, in_delta_index, out_weights_delta_index, out_bias_delta_index};
-    return conv_index;
-}
-
 void MKLDNNEmitter::build_convolution_backward_weights_bias(
    std::vector<mkldnn::primitive*>& mkldnn_primitives,
    const mkldnn::convolution_backward_weights::desc& bwd_desc,
@@ -462,15 +345,14 @@ void MKLDNNEmitter::build_convolution_backward_weights_bias(
    const std::vector<size_t>& deps,
    size_t conv_index)
 {
-    size_t in_data_index = deps[0];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.src_desc, in_data_index);
-    size_t in_delta_index = deps[1];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, in_delta_index);
-    size_t out_weights_delta_index = deps[2];
-    build_memory_primitive(
-        mkldnn_primitives, bwd_desc.data.diff_weights_desc, out_weights_delta_index);
-    size_t out_bias_delta_index = deps[3];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_bias_desc, out_bias_delta_index);
+    size_t src_index = deps[0];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.src_desc, src_index);
+    size_t diff_dst_index = deps[1];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, diff_dst_index);
+    size_t diff_weights_index = deps[2];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_weights_desc, diff_weights_index);
+    size_t diff_bias_index = deps[3];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_bias_desc, diff_bias_index);

    mkldnn::convolution_forward::primitive_desc fwd_pd{fwd_desc, executor::global_cpu_engine};

@@ -479,58 +361,10 @@ void MKLDNNEmitter::build_convolution_backward_weights_bias(

    mkldnn_primitives[conv_index] =
        new mkldnn::convolution_backward_weights(bwd_pd,
-                                                 *mkldnn_primitives[in_data_index],
-                                                 *mkldnn_primitives[in_delta_index],
-                                                 *mkldnn_primitives[out_weights_delta_index],
-                                                 *mkldnn_primitives[out_bias_delta_index]);
-}
-
-size_t
-    MKLDNNEmitter::build_convolution_backward_weights(const mkldnn::memory::desc& input_desc,
-                                                      const mkldnn::memory::desc& delta_desc,
-                                                      const mkldnn::memory::desc& result_desc,
-                                                      const ngraph::Strides& strides,
-                                                      const ngraph::Strides& dilation_strides,
-                                                      const ngraph::CoordinateDiff& padding_below,
-                                                      const ngraph::CoordinateDiff& padding_above)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t delta_index = build_memory_primitive(delta_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    mkldnn::algorithm convolution_algo = mkldnn_utils::get_conv_algo();
-    size_t primitive_index = insert_primitive(new mkldnn::convolution_backward_weights(
-        {{convolution_algo,
-          input_desc,
-          result_desc,
-          delta_desc,
-          mkldnn::memory::dims(strides.begin(), strides.end()),
-          mkldnn::memory::dims(dilation_strides.begin(), dilation_strides.end()),
-          mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-          mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-          mkldnn::padding_kind::zero},
-         executor::global_cpu_engine,
-         // Forward primitive descriptor corresponding to this backward weights descriptor
-         {{mkldnn::prop_kind::forward,
-           convolution_algo,
-           input_desc,
-           result_desc,
-           delta_desc,
-           mkldnn::memory::dims(strides.begin(), strides.end()),
-           mkldnn::memory::dims(dilation_strides.begin(), dilation_strides.end()),
-           mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-           mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-           mkldnn::padding_kind::zero},
-          executor::global_cpu_engine}},
-        *m_mkldnn_primitives[input_index],
-        *m_mkldnn_primitives[delta_index],
-        *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, delta_index, result_index};
-    return primitive_index;
+                                                 *mkldnn_primitives[src_index],
+                                                 *mkldnn_primitives[diff_dst_index],
+                                                 *mkldnn_primitives[diff_weights_index],
+                                                 *mkldnn_primitives[diff_bias_index]);
 }

 void MKLDNNEmitter::build_convolution_backward_weights(
@@ -540,66 +374,21 @@ void MKLDNNEmitter::build_convolution_backward_weights(
    const std::vector<size_t>& deps,
    size_t conv_index)
 {
-    size_t in_data_index = deps[0];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.src_desc, in_data_index);
-    size_t in_delta_index = deps[1];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, in_delta_index);
-    size_t out_weights_delta_index = deps[2];
-    build_memory_primitive(
-        mkldnn_primitives, bwd_desc.data.diff_weights_desc, out_weights_delta_index);
+    size_t src_index = deps[0];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.src_desc, src_index);
+    size_t diff_dst_index = deps[1];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, diff_dst_index);
+    size_t diff_weights_index = deps[2];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_weights_desc, diff_weights_index);

    mkldnn_primitives[conv_index] = new mkldnn::convolution_backward_weights(
        {bwd_desc,
         executor::global_cpu_engine,
         // Forward primitive descriptor corresponding to this backward weights descriptor
         {fwd_desc, executor::global_cpu_engine}},
-        *mkldnn_primitives[in_data_index],
-        *mkldnn_primitives[in_delta_index],
-        *mkldnn_primitives[out_weights_delta_index]);
-}
-
-size_t MKLDNNEmitter::build_convolution_backward_data(const mkldnn::memory::desc& weights_desc,
-                                                      const mkldnn::memory::desc& delta_desc,
-                                                      const mkldnn::memory::desc& result_desc,
-                                                      const ngraph::Strides& strides,
-                                                      const ngraph::Strides& dilation_strides,
-                                                      const ngraph::CoordinateDiff& padding_below,
-                                                      const ngraph::CoordinateDiff& padding_above)
-{
-    size_t weights_index = build_memory_primitive(weights_desc);
-    size_t delta_index = build_memory_primitive(delta_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    mkldnn::algorithm convolution_algo = mkldnn_utils::get_conv_algo();
-    size_t primitive_index = insert_primitive(new mkldnn::convolution_backward_data(
-        {{convolution_algo,
-          result_desc,
-          weights_desc,
-          delta_desc,
-          mkldnn::memory::dims(strides.begin(), strides.end()),
-          mkldnn::memory::dims(dilation_strides.begin(), dilation_strides.end()),
-          mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-          mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-          mkldnn::padding_kind::zero},
-         executor::global_cpu_engine,
-         // Forward primitive descriptor corresponding to this backward data descriptor
-         {{mkldnn::prop_kind::forward,
-           convolution_algo,
-           result_desc,
-           weights_desc,
-           delta_desc,
-           mkldnn::memory::dims(strides.begin(), strides.end()),
-           mkldnn::memory::dims(dilation_strides.begin(), dilation_strides.end()),
-           mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-           mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-           mkldnn::padding_kind::zero},
-          executor::global_cpu_engine}},
-        *m_mkldnn_primitives[delta_index],
-        *m_mkldnn_primitives[weights_index],
-        *m_mkldnn_primitives[result_index]));
-
-    m_primitive_deps[primitive_index] = {weights_index, delta_index, result_index};
-    return primitive_index;
+        *mkldnn_primitives[src_index],
+        *mkldnn_primitives[diff_dst_index],
+        *mkldnn_primitives[diff_weights_index]);
 }

 void MKLDNNEmitter::build_convolution_backward_data(
@@ -611,19 +400,19 @@ void MKLDNNEmitter::build_convolution_backward_data(
 {
    size_t weights_index = deps[0];
    build_memory_primitive(mkldnn_primitives, bwd_desc.data.weights_desc, weights_index);
-    size_t delta_index = deps[1];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, delta_index);
-    size_t result_index = deps[2];
-    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_src_desc, result_index);
+    size_t diff_dst_index = deps[1];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_dst_desc, diff_dst_index);
+    size_t diff_src_index = deps[2];
+    build_memory_primitive(mkldnn_primitives, bwd_desc.data.diff_src_desc, diff_src_index);

    mkldnn_primitives[conv_index] = new mkldnn::convolution_backward_data(
        {bwd_desc,
         executor::global_cpu_engine,
         // Forward primitive descriptor corresponding to this backward data descriptor
         {fwd_desc, executor::global_cpu_engine}},
-        *mkldnn_primitives[delta_index],
+        *mkldnn_primitives[diff_dst_index],
        *mkldnn_primitives[weights_index],
-        *mkldnn_primitives[result_index]);
+        *mkldnn_primitives[diff_src_index]);
 }

 void MKLDNNEmitter::build_pooling_forward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -642,47 +431,6 @@ void MKLDNNEmitter::build_pooling_forward(std::vector<mkldnn::primitive*>& mkldn
                                    *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_pooling_backward(mkldnn::algorithm pooling_algorithm,
-                                             const mkldnn::memory::desc& diff_dst_desc,
-                                             const mkldnn::memory::desc& diff_src_desc,
-                                             const ngraph::Strides& window_strides,
-                                             const ngraph::Shape& window_shape,
-                                             const ngraph::Shape& padding_below,
-                                             const ngraph::Shape& padding_above)
-{
-    size_t input_index = build_memory_primitive(diff_dst_desc);
-    size_t result_index = build_memory_primitive(diff_src_desc);
-
-    size_t primitive_index = insert_primitive(new mkldnn::pooling_backward(
-        {{pooling_algorithm,
-          diff_src_desc,
-          diff_dst_desc,
-          mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-          mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-          mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-          mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-          mkldnn::padding_kind::zero},
-         executor::global_cpu_engine,
-         {{mkldnn::prop_kind::forward_training,
-           pooling_algorithm,
-           diff_src_desc,
-           diff_dst_desc,
-           mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-           mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-           mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-           mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-           mkldnn::padding_kind::zero},
-          executor::global_cpu_engine}},
-        *m_mkldnn_primitives[input_index],
-        *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 void MKLDNNEmitter::build_pooling_backward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
                                           const mkldnn::pooling_backward::desc& pool_desc,
                                           const mkldnn::pooling_forward::desc& pool_fwd_desc,
@@ -703,71 +451,6 @@ void MKLDNNEmitter::build_pooling_backward(std::vector<mkldnn::primitive*>& mkld
        pool_pd, *mkldnn_primitives[input_index], *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_max_pooling_backward(mkldnn::algorithm pooling_algorithm,
-                                                 const mkldnn::memory::desc& fprop_src_desc,
-                                                 const mkldnn::memory::desc& diff_dst_desc,
-                                                 const mkldnn::memory::desc& diff_src_desc,
-                                                 const ngraph::Strides& window_strides,
-                                                 const ngraph::Shape& window_shape,
-                                                 const ngraph::Shape& padding_below,
-                                                 const ngraph::Shape& padding_above)
-{
-    size_t fprop_src_index = build_memory_primitive(fprop_src_desc);
-    size_t diff_dst_index = build_memory_primitive(diff_dst_desc);
-    size_t diff_src_index = build_memory_primitive(diff_src_desc);
-
-    mkldnn::pooling_forward::primitive_desc fwd_pd{
-        {mkldnn::prop_kind::forward_training,
-         pooling_algorithm,
-         diff_src_desc,
-         diff_dst_desc,
-         mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-         mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-         mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-         mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-         mkldnn::padding_kind::zero},
-        executor::global_cpu_engine};
-
-    auto ws_index = build_memory_primitive(fwd_pd.workspace_primitive_desc().desc());
-    // Allocate workspace
-    // TODO (jbobba): Might need to align memory
-    auto ws = std::unique_ptr<MKLDNNWorkspace>(
-        new MKLDNNWorkspace(fwd_pd.workspace_primitive_desc().get_size()));
-    auto ws_buf_index = insert_workspace(ws);
-
-    size_t fwd_primitive_index = insert_primitive(new mkldnn::pooling_forward(
-        fwd_pd,
-        *m_mkldnn_primitives[fprop_src_index],
-        *m_mkldnn_primitives
-            [diff_src_index], // HACK - Uses diff_src buffer. Safe since diff_src > fprop_dst
-        *m_mkldnn_primitives[ws_index]));
-
-    size_t bwd_primitive_index = insert_primitive(new mkldnn::pooling_backward(
-        {{pooling_algorithm,
-          diff_src_desc,
-          diff_dst_desc,
-          mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-          mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-          mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-          mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-          mkldnn::padding_kind::zero},
-         executor::global_cpu_engine,
-         fwd_pd},
-        *m_mkldnn_primitives[diff_dst_index],
-        *m_mkldnn_primitives[ws_index],
-        *m_mkldnn_primitives[diff_src_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(fwd_primitive_index) == m_primitive_deps.end() &&
-                     m_primitive_deps.find(bwd_primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[fwd_primitive_index] = {
-        fprop_src_index, diff_src_index, ws_index, ws_buf_index};
-    m_primitive_deps[bwd_primitive_index] = {
-        diff_dst_index, ws_index, diff_src_index, ws_buf_index};
-    return bwd_primitive_index;
-}
-
 void MKLDNNEmitter::build_max_pooling_backward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
                                               std::vector<char*>& mkldnn_workspaces,
                                               const mkldnn::pooling_backward::desc& bwd_pool_desc,
@@ -814,44 +497,6 @@ void MKLDNNEmitter::build_max_pooling_backward(std::vector<mkldnn::primitive*>&
                                     *mkldnn_primitives[diff_src_index]);
 }

-size_t MKLDNNEmitter::build_max_pooling_with_indices_forward(mkldnn::algorithm pooling_algorithm,
-                                                             const mkldnn::memory::desc& src_desc,
-                                                             const mkldnn::memory::desc& dst_desc,
-                                                             const ngraph::Strides& window_strides,
-                                                             const ngraph::Shape& window_shape,
-                                                             const ngraph::Shape& padding_below,
-                                                             const ngraph::Shape& padding_above)
-{
-    size_t src_index = build_memory_primitive(src_desc);
-    size_t dst_index = build_memory_primitive(dst_desc);
-
-    mkldnn::pooling_forward::primitive_desc fwd_pd{
-        {mkldnn::prop_kind::forward_training,
-         pooling_algorithm,
-         src_desc,
-         dst_desc,
-         mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-         mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-         mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-         mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-         mkldnn::padding_kind::zero},
-        executor::global_cpu_engine};
-
-    auto ws_index = build_memory_primitive(fwd_pd.workspace_primitive_desc().desc());
-
-    size_t fwd_primitive_index =
-        insert_primitive(new mkldnn::pooling_forward(fwd_pd,
-                                                     *m_mkldnn_primitives[src_index],
-                                                     *m_mkldnn_primitives[dst_index],
-                                                     *m_mkldnn_primitives[ws_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(fwd_primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[fwd_primitive_index] = {src_index, dst_index, ws_index};
-    return fwd_primitive_index;
-}
-
 void MKLDNNEmitter::build_max_pooling_with_indices_forward(
    std::vector<mkldnn::primitive*>& mkldnn_primitives,
    const mkldnn::pooling_forward::desc& max_pool_desc,
@@ -874,54 +519,6 @@ void MKLDNNEmitter::build_max_pooling_with_indices_forward(
                                                                    *mkldnn_primitives[ws_index]);
 }

-size_t MKLDNNEmitter::build_max_pooling_with_indices_backward(
-    mkldnn::algorithm pooling_algorithm,
-    const mkldnn::memory::desc& diff_dst_desc,
-    const mkldnn::memory::desc& diff_src_desc,
-    const ngraph::Strides& window_strides,
-    const ngraph::Shape& window_shape,
-    const ngraph::Shape& padding_below,
-    const ngraph::Shape& padding_above)
-{
-    size_t diff_dst_index = build_memory_primitive(diff_dst_desc);
-    size_t diff_src_index = build_memory_primitive(diff_src_desc);
-
-    mkldnn::pooling_forward::primitive_desc fwd_pd{
-        {mkldnn::prop_kind::forward_training,
-         pooling_algorithm,
-         diff_src_desc,
-         diff_dst_desc,
-         mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-         mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-         mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-         mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-         mkldnn::padding_kind::zero},
-        executor::global_cpu_engine};
-
-    auto fprop_ws_index = build_memory_primitive(fwd_pd.workspace_primitive_desc().desc());
-
-    size_t bwd_primitive_index = insert_primitive(new mkldnn::pooling_backward(
-        {{pooling_algorithm,
-          diff_src_desc,
-          diff_dst_desc,
-          mkldnn::memory::dims(window_strides.begin(), window_strides.end()),
-          mkldnn::memory::dims(window_shape.begin(), window_shape.end()),
-          mkldnn::memory::dims(padding_below.begin(), padding_below.end()),
-          mkldnn::memory::dims(padding_above.begin(), padding_above.end()),
-          mkldnn::padding_kind::zero},
-         executor::global_cpu_engine,
-         fwd_pd},
-        *m_mkldnn_primitives[diff_dst_index],
-        *m_mkldnn_primitives[fprop_ws_index],
-        *m_mkldnn_primitives[diff_src_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(bwd_primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[bwd_primitive_index] = {diff_dst_index, fprop_ws_index, diff_src_index};
-    return bwd_primitive_index;
-}
-
 void MKLDNNEmitter::build_max_pooling_with_indices_backward(
    std::vector<mkldnn::primitive*>& mkldnn_primitives,
    const mkldnn::pooling_backward::desc& bwd_pool_desc,
@@ -1023,27 +620,6 @@ void MKLDNNEmitter::build_lrn_forward(std::vector<mkldnn::primitive*>& mkldnn_pr
        lrn_prim_desc, *mkldnn_primitives[input_index], *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_relu_forward(const mkldnn::memory::desc& input_desc,
-                                         const mkldnn::memory::desc& result_desc)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    const float negative_slope = 0.0f;
-    auto relu_desc = mkldnn::eltwise_forward::desc(
-        mkldnn::prop_kind::forward, mkldnn::algorithm::eltwise_relu, input_desc, negative_slope);
-    auto relu_pd = mkldnn::eltwise_forward::primitive_desc(relu_desc, executor::global_cpu_engine);
-
-    size_t primitive_index = insert_primitive(new mkldnn::eltwise_forward(
-        relu_pd, *m_mkldnn_primitives[input_index], *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_forward::desc MKLDNNEmitter::get_relu_forward_desc(const ngraph::Node* node)
 {
    const float negative_slope = 0.0f;
@@ -1070,39 +646,6 @@ void MKLDNNEmitter::build_relu_forward(std::vector<mkldnn::primitive*>& mkldnn_p
                                    *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_relu_backward(const mkldnn::memory::desc& input_desc,
-                                          const mkldnn::memory::desc& delta_desc,
-                                          const mkldnn::memory::desc& result_desc)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t delta_index = build_memory_primitive(delta_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    /* Backward relu */
-    const float negative_slope = 0.0f;
-    auto relu_desc = mkldnn::eltwise_forward::desc(
-        mkldnn::prop_kind::forward, mkldnn::algorithm::eltwise_relu, input_desc, negative_slope);
-    auto relu_pd = mkldnn::eltwise_forward::primitive_desc(relu_desc, executor::global_cpu_engine);
-
-    /* create backward relu primitive_descriptor */
-    auto relu_bwd_desc = mkldnn::eltwise_backward::desc(
-        mkldnn::algorithm::eltwise_relu, result_desc, input_desc, negative_slope);
-    auto relu_bwd_pd = mkldnn::eltwise_backward::primitive_desc(
-        relu_bwd_desc, executor::global_cpu_engine, relu_pd);
-
-    size_t primitive_index =
-        insert_primitive(new mkldnn::eltwise_backward(relu_bwd_pd,
-                                                      *m_mkldnn_primitives[input_index],
-                                                      *m_mkldnn_primitives[delta_index],
-                                                      *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, delta_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_backward::desc MKLDNNEmitter::get_relu_backward_desc(const ngraph::Node* node)
 {
    auto input_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
@@ -1139,29 +682,6 @@ void MKLDNNEmitter::build_relu_backward(std::vector<mkldnn::primitive*>& mkldnn_
                                                                 *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_sigmoid_forward(const mkldnn::memory::desc& input_desc,
-                                            const mkldnn::memory::desc& result_desc)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    size_t primitive_index =
-        insert_primitive(new mkldnn::eltwise_forward({{mkldnn::prop_kind::forward_training,
-                                                       mkldnn::algorithm::eltwise_logistic,
-                                                       input_desc,
-                                                       0,
-                                                       0},
-                                                      executor::global_cpu_engine},
-                                                     *m_mkldnn_primitives[input_index],
-                                                     *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_forward::desc MKLDNNEmitter::get_sigmoid_forward_desc(const ngraph::Node* node,
                                                                      bool backward_op)
 {
@@ -1198,35 +718,6 @@ void MKLDNNEmitter::build_sigmoid_forward(std::vector<mkldnn::primitive*>& mkldn
                                    *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_sigmoid_backward(const mkldnn::memory::desc& input_desc,
-                                             const mkldnn::memory::desc& delta_desc,
-                                             const mkldnn::memory::desc& result_desc)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t delta_index = build_memory_primitive(delta_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    // sigmoid forward primitive desc
-    mkldnn::eltwise_forward::primitive_desc sigmoid_fwd_pd =
-        mkldnn::eltwise_forward::primitive_desc(
-            {mkldnn::prop_kind::forward, mkldnn::algorithm::eltwise_logistic, input_desc, 0, 0},
-            executor::global_cpu_engine);
-
-    size_t primitive_index = insert_primitive(new mkldnn::eltwise_backward(
-        {{mkldnn::algorithm::eltwise_logistic, delta_desc, input_desc, 0, 0},
-         executor::global_cpu_engine,
-         sigmoid_fwd_pd},
-        *m_mkldnn_primitives[input_index],
-        *m_mkldnn_primitives[delta_index],
-        *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, delta_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_backward::desc MKLDNNEmitter::get_sigmoid_backward_desc(const ngraph::Node* node)
 {
    auto input_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
@@ -1432,6 +923,7 @@ void MKLDNNEmitter::build_rnn_forward(std::vector<mkldnn::primitive*>& mkldnn_pr
    size_t weights_layer_index = deps[2];
    build_memory_primitive(
        mkldnn_primitives, rnn_desc.data.weights_layer_desc, weights_layer_index);
+
    size_t weights_iter_index = m_primitive_deps[rnn_index][3];
    build_memory_primitive(mkldnn_primitives, rnn_desc.data.weights_iter_desc, weights_iter_index);
    size_t bias_index = deps[4];
@@ -1463,48 +955,6 @@ void MKLDNNEmitter::build_rnn_forward(std::vector<mkldnn::primitive*>& mkldnn_pr
                                static_cast<mkldnn::memory>(*mkldnn_primitives[workspace_index]));
 }

-size_t MKLDNNEmitter::build_concat(const std::vector<mkldnn::memory::desc>& inputs_data_desc,
-                                   const mkldnn::memory::desc& result_desc,
-                                   const size_t concat_dim)
-{
-    std::vector<mkldnn::memory::primitive::at> inputs_primitive;
-    std::vector<size_t> inputs_data_index;
-    std::vector<size_t> in_out_index;
-    std::vector<mkldnn::memory::primitive_desc> inputs_pd;
-
-    for (size_t i = 0; i < inputs_data_desc.size(); i++)
-    {
-        inputs_pd.push_back(mkldnn::memory::primitive_desc(
-            inputs_data_desc[i], runtime::cpu::executor::global_cpu_engine));
-    }
-
-    for (size_t i = 0; i < inputs_data_desc.size(); i++)
-    {
-        inputs_data_index.push_back(build_memory_primitive(inputs_data_desc[i]));
-        inputs_primitive.push_back(*m_mkldnn_primitives[inputs_data_index[i]]);
-    }
-    size_t result_index = build_memory_primitive(result_desc);
-
-    // concat primtive descriptor
-    mkldnn::concat::primitive_desc concat_pd =
-        mkldnn::concat::primitive_desc(result_desc, static_cast<int>(concat_dim), inputs_pd);
-    // concat primitive
-    size_t concat_index = insert_primitive(
-        new mkldnn::concat(concat_pd, inputs_primitive, *m_mkldnn_primitives[result_index]));
-
-    for (size_t i = 0; i < inputs_data_index.size(); i++)
-    {
-        in_out_index.push_back(inputs_data_index[i]);
-    }
-    in_out_index.push_back(result_index);
-
-    NGRAPH_CHECK(m_primitive_deps.find(concat_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[concat_index] = in_out_index;
-    return concat_index;
-}
-
 void MKLDNNEmitter::build_concat(std::vector<mkldnn::primitive*>& mkldnn_primitives,
                                 const mkldnn::concat::primitive_desc& concat_pd,
                                 const std::vector<mkldnn::memory::desc>& inputs_data_desc,
@@ -1534,41 +984,6 @@ void MKLDNNEmitter::build_concat(std::vector<mkldnn::primitive*>& mkldnn_primiti
        new mkldnn::concat(concat_pd, inputs_primitive, *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_slice(const mkldnn::memory::desc& input_desc,
-                                  const mkldnn::memory::desc& result_desc,
-                                  const ngraph::Coordinate& lower_bounds,
-                                  const ngraph::Shape& result_shape)
-{
-    std::vector<size_t> in_out_index;
-    mkldnn::memory::primitive_desc input_pd =
-        mkldnn::memory::primitive_desc(input_desc, runtime::cpu::executor::global_cpu_engine);
-    size_t input_index = build_memory_primitive(input_desc);
-
-    auto dims = mkldnn::memory::dims(result_shape.begin(), result_shape.end());
-    auto offsets = mkldnn::memory::dims(lower_bounds.begin(), lower_bounds.end());
-    auto view_pd = mkldnn::view::primitive_desc(input_pd, dims, offsets).dst_primitive_desc();
-
-    mkldnn::memory::primitive_desc result_pd =
-        mkldnn::memory::primitive_desc(result_desc, runtime::cpu::executor::global_cpu_engine);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    // reorder primitive descriptor
-    mkldnn::reorder::primitive_desc reorder_pd =
-        mkldnn::reorder::primitive_desc(view_pd, result_pd);
-    // reorder primitive
-    size_t reorder_index = insert_primitive(new mkldnn::reorder(
-        reorder_pd, *m_mkldnn_primitives[input_index], *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(reorder_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    in_out_index.push_back(input_index);
-    in_out_index.push_back(result_index);
-
-    m_primitive_deps[reorder_index] = in_out_index;
-    return reorder_index;
-}
-
 void MKLDNNEmitter::build_slice(std::vector<mkldnn::primitive*>& mkldnn_primitives,
                                const mkldnn::memory::desc& input_desc,
                                const mkldnn::memory::desc& result_desc,
@@ -1600,26 +1015,6 @@ void MKLDNNEmitter::build_slice(std::vector<mkldnn::primitive*>& mkldnn_primitiv
        reorder_pd, *mkldnn_primitives[input_index], *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_softmax_forward(const mkldnn::memory::desc& input_desc,
-                                            const mkldnn::memory::desc& result_desc,
-                                            int softmax_axis)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    size_t primitive_index = insert_primitive(
-        new mkldnn::softmax_forward({{mkldnn::prop_kind::forward_scoring, input_desc, softmax_axis},
-                                     executor::global_cpu_engine},
-                                    *m_mkldnn_primitives[input_index],
-                                    *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::softmax_forward::desc MKLDNNEmitter::get_softmax_forward_desc(const ngraph::Node* node)
 {
    auto softmax = static_cast<const ngraph::op::Softmax*>(node);
@@ -1653,30 +1048,6 @@ void MKLDNNEmitter::build_softmax_forward(std::vector<mkldnn::primitive*>& mkldn
                                    *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_leaky_relu(const mkldnn::memory::desc& input_desc,
-                                       const mkldnn::memory::desc& result_desc,
-                                       float alpha)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    size_t primitive_index =
-        insert_primitive(new mkldnn::eltwise_forward({{mkldnn::prop_kind::forward_training,
-                                                       mkldnn::algorithm::eltwise_relu,
-                                                       input_desc,
-                                                       alpha,
-                                                       0.0f},
-                                                      executor::global_cpu_engine},
-                                                     *m_mkldnn_primitives[input_index],
-                                                     *m_mkldnn_primitives[result_index]));
-
-    NGRAPH_CHECK(m_primitive_deps.find(primitive_index) == m_primitive_deps.end(),
-                 "Dependencies already created for node");
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_forward::desc MKLDNNEmitter::get_leaky_relu_desc(const ngraph::Node* node)
 {
    auto alpha = static_cast<const ngraph::op::LeakyRelu*>(node)->get_alpha();
@@ -1706,27 +1077,6 @@ void MKLDNNEmitter::build_leaky_relu(std::vector<mkldnn::primitive*>& mkldnn_pri
                                    *mkldnn_primitives[result_index]);
 }

-size_t MKLDNNEmitter::build_bounded_relu(const mkldnn::memory::desc& input_desc,
-                                         const mkldnn::memory::desc& result_desc,
-                                         float alpha)
-{
-    size_t input_index = build_memory_primitive(input_desc);
-    size_t result_index = build_memory_primitive(result_desc);
-
-    size_t primitive_index =
-        insert_primitive(new mkldnn::eltwise_forward({{mkldnn::prop_kind::forward_training,
-                                                       mkldnn::algorithm::eltwise_bounded_relu,
-                                                       input_desc,
-                                                       alpha,
-                                                       0.0f},
-                                                      executor::global_cpu_engine},
-                                                     *m_mkldnn_primitives[input_index],
-                                                     *m_mkldnn_primitives[result_index]));
-
-    m_primitive_deps[primitive_index] = {input_index, result_index};
-    return primitive_index;
-}
-
 mkldnn::eltwise_forward::desc MKLDNNEmitter::get_bounded_relu_desc(const ngraph::Node* node)
 {
    auto alpha = static_cast<const ngraph::op::BoundedRelu*>(node)->get_alpha();

--- a/src/ngraph/runtime/cpu/mkldnn_emitter.hpp
+++ b/src/ngraph/runtime/cpu/mkldnn_emitter.hpp
@@ -33,6 +33,7 @@
 #include "ngraph/op/concat.hpp"
 #include "ngraph/op/constant.hpp"
 #include "ngraph/op/convolution.hpp"
+#include "ngraph/op/dequantize.hpp"
 #include "ngraph/op/experimental/quantized_avg_pool.hpp"
 #include "ngraph/op/experimental/quantized_conv.hpp"
 #include "ngraph/op/experimental/quantized_conv_bias.hpp"
@@ -44,6 +45,7 @@
 #include "ngraph/op/fused/group_conv.hpp"
 #include "ngraph/op/lrn.hpp"
 #include "ngraph/op/max_pool.hpp"
+#include "ngraph/op/quantize.hpp"
 #include "ngraph/op/softmax.hpp"
 #include "ngraph/runtime/cpu/cpu_executor.hpp"
 #include "ngraph/runtime/cpu/cpu_tensor_view_wrapper.hpp"
@@ -186,17 +188,6 @@ namespace ngraph
                    size_t conv_index,
                    const mkldnn::memory::desc& weights_desc);

-                size_t build_deconvolutionbias_forward(
-                    const mkldnn::memory::desc& input_data_desc,
-                    const mkldnn::memory::desc& weights_desc,
-                    const mkldnn::memory::desc& bias_desc,
-                    const mkldnn::memory::desc& result_desc,
-                    const ngraph::Strides& strides,
-                    const ngraph::Strides& dilation_strides,
-                    const ngraph::CoordinateDiff& padding_below,
-                    const ngraph::CoordinateDiff& padding_above,
-                    const mkldnn::post_ops& pops = mkldnn::post_ops());
-
                template <typename OP>
                size_t build_deconvolution(const ngraph::Node* node,
                                           const std::vector<TensorViewWrapper>& args,
@@ -323,15 +314,6 @@ namespace ngraph
                    const ngraph::CoordinateDiff& padding_below,
                    const ngraph::CoordinateDiff& padding_above);

-                size_t
-                    build_convolution_backward_weights(const mkldnn::memory::desc& input_desc,
-                                                       const mkldnn::memory::desc& delta_desc,
-                                                       const mkldnn::memory::desc& result_desc,
-                                                       const ngraph::Strides& strides,
-                                                       const ngraph::Strides& dilation_strides,
-                                                       const ngraph::CoordinateDiff& padding_below,
-                                                       const ngraph::CoordinateDiff& padding_above);
-
                void build_convolution_backward_weights(
                    std::vector<mkldnn::primitive*>& mkldnn_primitives,
                    const mkldnn::convolution_backward_weights::desc& bwd_desc,
@@ -339,14 +321,6 @@ namespace ngraph
                    const std::vector<size_t>& deps,
                    size_t conv_index);

-                size_t build_convolution_backward_data(const mkldnn::memory::desc& weights_desc,
-                                                       const mkldnn::memory::desc& delta_desc,
-                                                       const mkldnn::memory::desc& result_desc,
-                                                       const ngraph::Strides& strides,
-                                                       const ngraph::Strides& dilation_strides,
-                                                       const ngraph::CoordinateDiff& padding_below,
-                                                       const ngraph::CoordinateDiff& padding_above);
-
                void build_convolution_backward_data(
                    std::vector<mkldnn::primitive*>& mkldnn_primitives,
                    const mkldnn::convolution_backward_data::desc& bwd_desc,
@@ -357,16 +331,6 @@ namespace ngraph
                /**
                 * Convolution + bias backprop for weights and bias
                 */
-                size_t build_convolution_backward_weights_bias(
-                    const mkldnn::memory::desc& in_data_desc,
-                    const mkldnn::memory::desc& in_delta_desc,
-                    const mkldnn::memory::desc& out_weights_delta_desc,
-                    const mkldnn::memory::desc& out_bias_delta_desc,
-                    const ngraph::Strides& ng_strides,
-                    const ngraph::Strides& ng_dilation_strides,
-                    const ngraph::CoordinateDiff& ng_padding_below,
-                    const ngraph::CoordinateDiff& ng_padding_above);
-
                void build_convolution_backward_weights_bias(
                    std::vector<mkldnn::primitive*>& mkldnn_primitives,
                    const mkldnn::convolution_backward_weights::desc& bwd_desc,
@@ -472,14 +436,6 @@ namespace ngraph
                                           const std::vector<size_t>& deps,
                                           size_t pool_index);

-                size_t build_pooling_backward(mkldnn::algorithm pooling_algorithm,
-                                              const mkldnn::memory::desc& diff_dst_desc,
-                                              const mkldnn::memory::desc& diff_src_desc,
-                                              const ngraph::Strides& window_strides,
-                                              const ngraph::Shape& window_shape,
-                                              const ngraph::Shape& padding_below,
-                                              const ngraph::Shape& padding_above);
-
                template <typename OP>
                mkldnn::pooling_backward::desc
                    get_avg_pooling_backward_desc(const ngraph::Node* node)
@@ -515,14 +471,6 @@ namespace ngraph
                                            const std::vector<size_t>& deps,
                                            size_t pool_index);

-                size_t build_max_pooling_with_indices_forward(mkldnn::algorithm pooling_algorithm,
-                                                              const mkldnn::memory::desc& src_desc,
-                                                              const mkldnn::memory::desc& dst_desc,
-                                                              const ngraph::Strides& window_strides,
-                                                              const ngraph::Shape& window_shape,
-                                                              const ngraph::Shape& padding_below,
-                                                              const ngraph::Shape& padding_above);
-
                template <typename OP>
                mkldnn::pooling_forward::desc
                    get_max_pooling_with_indices_forward_desc(const ngraph::Node* node)
@@ -555,15 +503,6 @@ namespace ngraph
                    const std::vector<size_t>& deps,
                    size_t max_pool_index);

-                size_t build_max_pooling_backward(mkldnn::algorithm pooling_algorithm,
-                                                  const mkldnn::memory::desc& fprop_src_desc,
-                                                  const mkldnn::memory::desc& diff_dst_desc,
-                                                  const mkldnn::memory::desc& diff_src_desc,
-                                                  const ngraph::Strides& window_strides,
-                                                  const ngraph::Shape& window_shape,
-                                                  const ngraph::Shape& padding_below,
-                                                  const ngraph::Shape& padding_above);
-
                template <typename OP>
                mkldnn::pooling_backward::desc
                    get_max_pooling_backward_desc(const ngraph::Node* node)
@@ -599,15 +538,6 @@ namespace ngraph
                                                size_t fwd_pool_index,
                                                size_t bwd_pool_index);

-                size_t build_max_pooling_with_indices_backward(
-                    mkldnn::algorithm pooling_algorithm,
-                    const mkldnn::memory::desc& diff_dst_desc,
-                    const mkldnn::memory::desc& diff_src_desc,
-                    const ngraph::Strides& window_strides,
-                    const ngraph::Shape& window_shape,
-                    const ngraph::Shape& padding_below,
-                    const ngraph::Shape& padding_above);
-
                void build_max_pooling_with_indices_backward(
                    std::vector<mkldnn::primitive*>& mkldnn_primitives,
                    const mkldnn::pooling_backward::desc& bwd_pool_desc,
@@ -631,9 +561,6 @@ namespace ngraph
                                       const std::vector<size_t>& deps,
                                       size_t lrn_index);

-                size_t build_relu_forward(const mkldnn::memory::desc& input_desc,
-                                          const mkldnn::memory::desc& result_desc);
-
                mkldnn::eltwise_forward::desc get_relu_forward_desc(const ngraph::Node* node);

                void build_relu_forward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -641,10 +568,6 @@ namespace ngraph
                                        const std::vector<size_t>& deps,
                                        size_t relu_index);

-                size_t build_relu_backward(const mkldnn::memory::desc& input_desc,
-                                           const mkldnn::memory::desc& delta_desc,
-                                           const mkldnn::memory::desc& result_desc);
-
                mkldnn::eltwise_backward::desc get_relu_backward_desc(const ngraph::Node* node);

                void build_relu_backward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -653,9 +576,6 @@ namespace ngraph
                                         const std::vector<size_t>& deps,
                                         size_t relu_index);

-                size_t build_sigmoid_forward(const mkldnn::memory::desc& input_desc,
-                                             const mkldnn::memory::desc& result_desc);
-
                mkldnn::eltwise_forward::desc get_sigmoid_forward_desc(const ngraph::Node* node,
                                                                       bool backward_op);

@@ -664,10 +584,6 @@ namespace ngraph
                                           const std::vector<size_t>& deps,
                                           size_t sigmoid_index);

-                size_t build_sigmoid_backward(const mkldnn::memory::desc& input_desc,
-                                              const mkldnn::memory::desc& delta_desc,
-                                              const mkldnn::memory::desc& result_desc);
-
                mkldnn::eltwise_backward::desc get_sigmoid_backward_desc(const ngraph::Node* node);

                void build_sigmoid_backward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -737,10 +653,6 @@ namespace ngraph
                                       std::vector<size_t>& deps,
                                       size_t rnn_idx);

-                size_t build_concat(const std::vector<mkldnn::memory::desc>& inputs_data_desc,
-                                    const mkldnn::memory::desc& result_desc,
-                                    const size_t concat_dim);
-
                template <typename OP>
                mkldnn::concat::primitive_desc get_concat_desc(const ngraph::Node* node,
                                                               size_t nargs)
@@ -769,11 +681,6 @@ namespace ngraph
                                  const std::vector<size_t>& deps,
                                  size_t concat_index);

-                size_t build_slice(const mkldnn::memory::desc& input_desc,
-                                   const mkldnn::memory::desc& result_desc,
-                                   const ngraph::Coordinate& lower_bounds,
-                                   const ngraph::Shape& result_shape);
-
                void build_slice(std::vector<mkldnn::primitive*>& mkldnn_primitives,
                                 const mkldnn::memory::desc& input_desc,
                                 const mkldnn::memory::desc& result_desc,
@@ -782,10 +689,6 @@ namespace ngraph
                                 const std::vector<size_t>& deps,
                                 size_t slice_index);

-                size_t build_softmax_forward(const mkldnn::memory::desc& input_desc,
-                                             const mkldnn::memory::desc& result_desc,
-                                             int softmax_axis);
-
                mkldnn::softmax_forward::desc get_softmax_forward_desc(const ngraph::Node* node);

                void build_softmax_forward(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -793,10 +696,6 @@ namespace ngraph
                                           const std::vector<size_t>& deps,
                                           size_t softmax_index);

-                size_t build_leaky_relu(const mkldnn::memory::desc& input_desc,
-                                        const mkldnn::memory::desc& result_desc,
-                                        float alpha);
-
                mkldnn::eltwise_forward::desc get_leaky_relu_desc(const ngraph::Node* node);

                void build_leaky_relu(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -804,10 +703,6 @@ namespace ngraph
                                      const std::vector<size_t>& deps,
                                      size_t leaky_relu_index);

-                size_t build_bounded_relu(const mkldnn::memory::desc& input_desc,
-                                          const mkldnn::memory::desc& result_desc,
-                                          float alpha);
-
                mkldnn::eltwise_forward::desc get_bounded_relu_desc(const ngraph::Node* node);

                void build_bounded_relu(std::vector<mkldnn::primitive*>& mkldnn_primitives,
@@ -835,9 +730,14 @@ namespace ngraph
                size_t get_scale_index()
                {
                    size_t index = 0;
-                    if (std::is_same<OP, ngraph::op::QuantizedConvolution>() ||
-                        std::is_same<OP, ngraph::op::QuantizedMatmul>() ||
-                        std::is_same<OP, ngraph::op::QuantizedConvolutionRelu>())
+                    if (std::is_same<OP, ngraph::op::Quantize>() ||
+                        std::is_same<OP, ngraph::op::Dequantize>())
+                    {
+                        index = 1;
+                    }
+                    else if (std::is_same<OP, ngraph::op::QuantizedConvolution>() ||
+                             std::is_same<OP, ngraph::op::QuantizedMatmul>() ||
+                             std::is_same<OP, ngraph::op::QuantizedConvolutionRelu>())
                    {
                        index = 2;
                    }
@@ -1354,15 +1254,15 @@ namespace ngraph
                    {
                        weights_desc.data.format = mkldnn_oidhw;
                    }
-                    auto delta_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
-                    auto result_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                    auto diff_dst_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
+                    auto diff_src_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
                    mkldnn::algorithm convolution_algo = mkldnn_utils::get_conv_algo();

                    return mkldnn::convolution_backward_data::desc(
                        convolution_algo,
-                        result_desc,
+                        diff_src_desc,
                        weights_desc,
-                        delta_desc,
+                        diff_dst_desc,
                        MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                        MKLDNN_DIMS(window_dilation_strides_adjusted),
                        MKLDNN_DIMS(convolution->get_padding_below_forward()),
@@ -1384,20 +1284,20 @@ namespace ngraph
                        window_dilation_strides_adjusted.push_back(s - 1);
                    }

-                    auto in_data_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
-                    auto in_delta_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
-                    auto out_weights_delta_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                    auto src_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
+                    auto diff_dst_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
+                    auto diff_weights_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
                    mkldnn::algorithm convolution_algo = mkldnn_utils::get_conv_algo();
                    if (has_bias<OP>())
                    {
-                        auto out_bias_delta_desc = mkldnn_utils::get_output_mkldnn_md(node, 1);
+                        auto diff_bias_desc = mkldnn_utils::get_output_mkldnn_md(node, 1);

                        return mkldnn::convolution_backward_weights::desc(
                            convolution_algo,
-                            in_data_desc,
-                            out_weights_delta_desc,
-                            out_bias_delta_desc,
-                            in_delta_desc,
+                            src_desc,
+                            diff_weights_desc,
+                            diff_bias_desc,
+                            diff_dst_desc,
                            MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                            MKLDNN_DIMS(window_dilation_strides_adjusted),
                            MKLDNN_DIMS(convolution->get_padding_below_forward()),
@@ -1408,9 +1308,9 @@ namespace ngraph
                    {
                        return mkldnn::convolution_backward_weights::desc(
                            convolution_algo,
-                            in_data_desc,
-                            out_weights_delta_desc,
-                            in_delta_desc,
+                            src_desc,
+                            diff_weights_desc,
+                            diff_dst_desc,
                            MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                            MKLDNN_DIMS(window_dilation_strides_adjusted),
                            MKLDNN_DIMS(convolution->get_padding_below_forward()),
@@ -1446,15 +1346,15 @@ namespace ngraph
                        {
                            weights_desc.data.format = mkldnn_oidhw;
                        }
-                        auto delta_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
-                        auto result_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                        auto diff_dst_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
+                        auto diff_src_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);

                        return mkldnn::convolution_forward::desc(
                            mkldnn::prop_kind::forward,
                            convolution_algo,
-                            result_desc,
+                            diff_src_desc,
                            weights_desc,
-                            delta_desc,
+                            diff_dst_desc,
                            MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                            MKLDNN_DIMS(window_dilation_strides_adjusted),
                            MKLDNN_DIMS(convolution->get_padding_below_forward()),
@@ -1463,15 +1363,15 @@ namespace ngraph
                    }
                    else if (std::is_same<OP, ngraph::op::ConvolutionBackpropFilters>())
                    {
-                        auto in_data_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
-                        auto in_delta_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
-                        auto out_weights_delta_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                        auto src_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
+                        auto diff_dst_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
+                        auto diff_weights_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
                        return mkldnn::convolution_forward::desc(
                            mkldnn::prop_kind::forward,
                            convolution_algo,
-                            in_data_desc,
-                            out_weights_delta_desc,
-                            in_delta_desc,
+                            src_desc,
+                            diff_weights_desc,
+                            diff_dst_desc,
                            MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                            MKLDNN_DIMS(window_dilation_strides_adjusted),
                            MKLDNN_DIMS(convolution->get_padding_below_forward()),
@@ -1480,18 +1380,18 @@ namespace ngraph
                    }
                    else
                    {
-                        auto in_data_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
-                        auto in_delta_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
-                        auto out_weights_delta_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
-                        auto out_bias_delta_desc = mkldnn_utils::get_output_mkldnn_md(node, 1);
+                        auto src_desc = mkldnn_utils::get_input_mkldnn_md(node, 0);
+                        auto diff_dst_desc = mkldnn_utils::get_input_mkldnn_md(node, 1);
+                        auto diff_weights_desc = mkldnn_utils::get_output_mkldnn_md(node, 0);
+                        auto diff_bias_desc = mkldnn_utils::get_output_mkldnn_md(node, 1);

                        return mkldnn::convolution_forward::desc(
                            mkldnn::prop_kind::forward,
                            convolution_algo,
-                            in_data_desc,
-                            out_weights_delta_desc,
-                            out_bias_delta_desc,
-                            in_delta_desc,
+                            src_desc,
+                            diff_weights_desc,
+                            diff_bias_desc,
+                            diff_dst_desc,
                            MKLDNN_DIMS(convolution->get_window_movement_strides_forward()),
                            MKLDNN_DIMS(window_dilation_strides_adjusted),
                            MKLDNN_DIMS(convolution->get_padding_below_forward()),

--- a/src/ngraph/runtime/cpu/op/max_pool_with_indices.hpp
+++ b/src/ngraph/runtime/cpu/op/max_pool_with_indices.hpp
@@ -18,6 +18,7 @@

 #include "ngraph/graph_util.hpp"
 #include "ngraph/op/op.hpp"
+#include "ngraph/runtime/cpu/cpu_backend_visibility.h"

 namespace ngraph
 {
@@ -31,11 +32,11 @@ namespace ngraph
        class MaxPoolWithIndices : public Op
        {
        public:
-            MaxPoolWithIndices(const std::shared_ptr<Node>& arg,
-                               const Shape& window_shape,
-                               const Strides& window_movement_strides,
-                               const Shape& padding_below,
-                               const Shape& padding_above);
+            CPU_BACKEND_API MaxPoolWithIndices(const std::shared_ptr<Node>& arg,
+                                               const Shape& window_shape,
+                                               const Strides& window_movement_strides,
+                                               const Shape& padding_below,
+                                               const Shape& padding_above);

            virtual std::shared_ptr<Node>
                copy_with_new_args(const NodeVector& new_args) const override;
@@ -64,13 +65,13 @@ namespace ngraph
        class MaxPoolWithIndicesBackprop : public Op
        {
        public:
-            MaxPoolWithIndicesBackprop(const std::shared_ptr<Node>& arg_forward,
-                                       const std::shared_ptr<Node>& delta,
-                                       const std::shared_ptr<Node>& indices,
-                                       const Shape& window_shape,
-                                       const Strides& window_movement_strides,
-                                       const Shape& padding_below,
-                                       const Shape& padding_above);
+            CPU_BACKEND_API MaxPoolWithIndicesBackprop(const std::shared_ptr<Node>& arg_forward,
+                                                       const std::shared_ptr<Node>& delta,
+                                                       const std::shared_ptr<Node>& indices,
+                                                       const Shape& window_shape,
+                                                       const Strides& window_movement_strides,
+                                                       const Shape& padding_below,
+                                                       const Shape& padding_above);

            virtual std::shared_ptr<Node>
                copy_with_new_args(const NodeVector& new_args) const override;

--- a/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.cpp
+++ b/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.cpp
--- a/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.hpp
+++ b/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.hpp
@@ -23,10 +23,6 @@
 #include <typeindex>
 #include <unordered_map>

-#define BUILD_PRIMITIVE_DECL(op_name)                                                              \
-    build_primitive<op_name>(ngraph::runtime::cpu::MKLDNNEmitter & mkldnn_emitter,                 \
-                             ngraph::Node * node)
-
 #define CONSTRUCT_PRIMITIVE_BUILD_STRING_DECL(op_name)                                             \
    construct_primitive_build_string<op_name>(ngraph::runtime::cpu::MKLDNNEmitter &                \
                                                  mkldnn_emitter,                                  \
@@ -53,11 +49,6 @@ namespace ngraph

            namespace pass
            {
-                using PrimitiveBuildFunction =
-                    std::function<size_t(ngraph::runtime::cpu::MKLDNNEmitter&, ngraph::Node*)>;
-                using PrimitiveBuildOpMap =
-                    std::unordered_map<std::type_index, PrimitiveBuildFunction>;
-
                using PrimitiveBuildStringConstructFunction =
                    std::function<void(ngraph::runtime::cpu::MKLDNNEmitter&,
                                       ngraph::Node*,
@@ -77,10 +68,6 @@ namespace ngraph

                    ngraph::runtime::cpu::MKLDNNEmitter& m_mkldnn_emitter;

-                    /// External map to store each node with mkldnn implementation and its mkldnn
-                    /// associated primitive index.
-                    std::unordered_map<const Node*, size_t>& m_node_primitive_idx_map;
-
                    /// External map to store each node with mkldnn implementation and its mkldnn
                    /// creation string, deps, and mkldnn primitive index.
                    std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>&
@@ -90,12 +77,10 @@ namespace ngraph
                    MKLDNNPrimitiveBuildPass(
                        std::string filename,
                        ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,
-                        std::unordered_map<const Node*, size_t>& node_primitive_idx_map,
                        std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>&
                            node_primitive_string_deps_index_map)
                        : m_desc_filename(filename)
                        , m_mkldnn_emitter(mkldnn_emitter)
-                        , m_node_primitive_idx_map(node_primitive_idx_map)
                        , m_node_primitive_string_deps_index_map(
                              node_primitive_string_deps_index_map)
                    {
@@ -103,15 +88,6 @@ namespace ngraph

                    bool run_on_call_graph(const std::list<std::shared_ptr<Node>>& nodes) override;

-                    template <typename OP>
-                    static size_t
-                        build_primitive(ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,
-                                        ngraph::Node* node)
-                    {
-                        throw std::runtime_error("Unimplemented op '" + node->description() +
-                                                 "' in MKLDNNPrimitiveBuildPass");
-                    }
-
                    template <typename OP>
                    static void construct_primitive_build_string(
                        ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,

--- a/test/cpu_test.cpp
+++ b/test/cpu_test.cpp
@@ -40,6 +40,7 @@
 #include "ngraph/runtime/cpu/cpu_builder.hpp"
 #include "ngraph/runtime/cpu/mkldnn_utils.hpp"
 #include "ngraph/runtime/cpu/op/convert_layout.hpp"
+#include "ngraph/runtime/cpu/op/max_pool_with_indices.hpp"
 #include "ngraph/serializer.hpp"
 #include "ngraph/util.hpp"
 #include "util/all_close.hpp"
@@ -1347,3 +1348,384 @@ TEST(cpu_test, gauss_error_function_erf_int32)
    auto expected_values = expected_result_nd_array.get_vector();
    ASSERT_EQ(result_values, expected_values);
 }
+
+TEST(cpu_test, max_pool_with_indices_2d_2channel_2image)
+{
+    Shape shape_a{2, 2, 5, 5};
+    Shape window_shape{2, 3};
+    auto window_movement_strides = Strides{1, 1};
+    Shape padding_below{0, 0};
+    Shape padding_above{0, 0};
+    auto A = make_shared<op::Parameter>(element::f32, shape_a);
+    auto max_pool = make_shared<op::MaxPoolWithIndices>(
+        A, window_shape, window_movement_strides, padding_below, padding_above);
+    Shape shape_r{2, 2, 4, 3};
+    auto data = make_shared<op::Result>(make_shared<op::GetOutputElement>(max_pool, 0));
+    auto indices = make_shared<op::Result>(make_shared<op::GetOutputElement>(max_pool, 1));
+    auto f = make_shared<Function>(ResultVector{data, indices}, ParameterVector{A});
+
+    auto backend = runtime::Backend::create("CPU");
+
+    // Create some tensors for input/output
+    auto a = backend->create_tensor(element::f32, shape_a);
+    copy_data(a,
+              test::NDArray<float, 4>({{{{0, 1, 0, 2, 1}, // img 0 chan 0
+                                         {0, 3, 2, 0, 0},
+                                         {2, 0, 0, 0, 1},
+                                         {2, 0, 1, 1, 2},
+                                         {0, 2, 1, 0, 0}},
+
+                                        {{0, 0, 0, 2, 0}, // img 0 chan 1
+                                         {0, 2, 3, 0, 1},
+                                         {2, 0, 1, 0, 2},
+                                         {3, 1, 0, 0, 0},
+                                         {2, 0, 0, 0, 0}}},
+
+                                       {{{0, 2, 1, 1, 0}, // img 1 chan 0
+                                         {0, 0, 2, 0, 1},
+                                         {0, 0, 1, 2, 3},
+                                         {2, 0, 0, 3, 0},
+                                         {0, 0, 0, 0, 0}},
+
+                                        {{2, 1, 0, 0, 1}, // img 1 chan 1
+                                         {0, 2, 0, 0, 0},
+                                         {1, 1, 2, 0, 2},
+                                         {1, 1, 1, 0, 1},
+                                         {1, 0, 0, 0, 2}}}})
+                  .get_vector());
+    auto result_data = backend->create_tensor(element::f32, shape_r);
+    auto result_indices = backend->create_tensor(element::i32, shape_r);
+
+    auto handle = backend->compile(f);
+    handle->call_with_validate({result_data, result_indices}, {a});
+    EXPECT_TRUE(test::all_close_f((test::NDArray<float, 4>({{{{3, 3, 2}, // img 0 chan 0
+                                                              {3, 3, 2},
+                                                              {2, 1, 2},
+                                                              {2, 2, 2}},
+
+                                                             {{3, 3, 3}, // img 0 chan 1
+                                                              {3, 3, 3},
+                                                              {3, 1, 2},
+                                                              {3, 1, 0}}},
+
+                                                            {{{2, 2, 2}, // img 1 chan 0
+                                                              {2, 2, 3},
+                                                              {2, 3, 3},
+                                                              {2, 3, 3}},
+
+                                                             {{2, 2, 1}, // img 1 chan 1
+                                                              {2, 2, 2},
+                                                              {2, 2, 2},
+                                                              {1, 1, 2}}}})
+                                       .get_vector()),
+                                  read_vector<float>(result_data),
+                                  MIN_FLOAT_TOLERANCE_BITS));
+
+    EXPECT_TRUE(test::all_close((test::NDArray<int, 4>({{{{4, 3, 1}, // img 0 chan 0
+                                                          {1, 0, 0},
+                                                          {0, 4, 5},
+                                                          {0, 3, 2}},
+
+                                                         {{5, 4, 3}, // img 0 chan 1
+                                                          {2, 1, 0},
+                                                          {3, 1, 2},
+                                                          {0, 0, 0}}},
+
+                                                        {{{1, 0, 3}, // img 1 chan 0
+                                                          {2, 1, 5},
+                                                          {3, 5, 2},
+                                                          {0, 2, 1}},
+
+                                                         {{0, 3, 2}, // img 1 chan 1
+                                                          {1, 0, 3},
+                                                          {2, 1, 0},
+                                                          {0, 0, 5}}}})
+                                     .get_vector()),
+                                read_vector<int>(result_indices)));
+}
+
+TEST(cpu_test, max_pool_with_indices_bprop_2d_2channel_2image)
+{
+    Shape shape_a{2, 2, 5, 5};
+    Shape window_shape{2, 3};
+    auto window_movement_strides = Strides{1, 1};
+    Shape padding_below{0, 0};
+    Shape padding_above{0, 0};
+    auto A = make_shared<op::Parameter>(element::f32, shape_a);
+    Shape shape_i{2, 2, 4, 3};
+    auto indices = make_shared<op::Parameter>(element::i32, shape_i);
+    auto delta = make_shared<op::Parameter>(element::f32, shape_i);
+
+    auto max_pool_bprop = make_shared<op::MaxPoolWithIndicesBackprop>(
+        A, delta, indices, window_shape, window_movement_strides, padding_below, padding_above);
+
+    auto f = make_shared<Function>(max_pool_bprop, ParameterVector{A, delta, indices});
+
+    auto backend = runtime::Backend::create("CPU");
+
+    // Create some tensors for input/output
+    auto a = backend->create_tensor(element::f32, shape_a);
+    copy_data(a,
+              test::NDArray<float, 4>({{{{0, 1, 0, 2, 1}, // img 0 chan 0
+                                         {0, 3, 2, 0, 0},
+                                         {2, 0, 0, 0, 1},
+                                         {2, 0, 1, 1, 2},
+                                         {0, 2, 1, 0, 0}},
+
+                                        {{0, 0, 0, 2, 0}, // img 0 chan 1
+                                         {0, 2, 3, 0, 1},
+                                         {2, 0, 1, 0, 2},
+                                         {3, 1, 0, 0, 0},
+                                         {2, 0, 0, 0, 0}}},
+
+                                       {{{0, 2, 1, 1, 0}, // img 1 chan 0
+                                         {0, 0, 2, 0, 1},
+                                         {0, 0, 1, 2, 3},
+                                         {2, 0, 0, 3, 0},
+                                         {0, 0, 0, 0, 0}},
+
+                                        {{2, 1, 0, 0, 1}, // img 1 chan 1
+                                         {0, 2, 0, 0, 0},
+                                         {1, 1, 2, 0, 2},
+                                         {1, 1, 1, 0, 1},
+                                         {1, 0, 0, 0, 2}}}})
+                  .get_vector());
+
+    auto i = backend->create_tensor(element::i32, shape_i);
+    copy_data(i,
+              test::NDArray<int, 4>({{{{4, 3, 1}, // img 0 chan 0
+                                       {1, 0, 0},
+                                       {0, 4, 5},
+                                       {0, 3, 2}},
+
+                                      {{5, 4, 3}, // img 0 chan 1
+                                       {2, 1, 0},
+                                       {3, 1, 2},
+                                       {0, 0, 0}}},
+
+                                     {{{1, 0, 3}, // img 1 chan 0
+                                       {2, 1, 5},
+                                       {3, 5, 2},
+                                       {0, 2, 1}},
+
+                                      {{0, 3, 2}, // img 1 chan 1
+                                       {1, 0, 3},
+                                       {2, 1, 0},
+                                       {0, 0, 5}}}})
+                  .get_vector());
+
+    auto d = backend->create_tensor(element::f32, shape_i);
+    copy_data(d,
+              test::NDArray<float, 4>({{{{0.3, 0.3, 0.2}, // img 0 chan 0
+                                         {0.3, 0.3, 0.2},
+                                         {0.2, 0.1, 0.2},
+                                         {0.2, 0.2, 0.2}},
+
+                                        {{0.3, 0.3, 0.3}, // img 0 chan 1
+                                         {0.3, 0.3, 0.3},
+                                         {0.3, 0.1, 0.2},
+                                         {0.3, 0.1, 0.4}}},
+
+                                       {{{0.2, 0.2, 0.2}, // img 1 chan 0
+                                         {0.2, 0.2, 0.3},
+                                         {0.2, 0.3, 0.3},
+                                         {0.2, 0.3, 0.3}},
+
+                                        {{0.2, 0.2, 0.1}, // img 1 chan 1
+                                         {0.2, 0.2, 0.2},
+                                         {0.2, 0.2, 0.2},
+                                         {0.1, 0.1, 0.2}}}})
+                  .get_vector());
+
+    auto result = backend->create_tensor(element::f32, shape_a);
+
+    auto handle = backend->compile(f);
+    handle->call_with_validate({result}, {a, d, i});
+    EXPECT_TRUE(test::all_close_f((test::NDArray<float, 4>({{{{0, 0, 0, 0.2, 0}, // img 0 chan 0
+                                                              {0, 1.2, 0.2, 0, 0},
+                                                              {0.2, 0, 0, 0, 0},
+                                                              {0.2, 0, 0.1, 0, 0.4},
+                                                              {0, 0.2, 0, 0, 0}},
+
+                                                             {{0, 0, 0, 0, 0}, // img 0 chan 1
+                                                              {0, 0, 1.8, 0, 0},
+                                                              {0, 0, 0.1, 0, 0.2},
+                                                              {0.6, 0.1, 0.4, 0, 0},
+                                                              {0, 0, 0, 0, 0}}},
+
+                                                            {{{0, 0.4, 0, 0, 0}, // img 1 chan 0
+                                                              {0, 0, 0.6, 0, 0},
+                                                              {0, 0, 0, 0, 0.6},
+                                                              {0.4, 0, 0, 0.9, 0},
+                                                              {0, 0, 0, 0, 0}},
+
+                                                             {{0.2, 0, 0, 0, 0.1}, // img 1 chan 1
+                                                              {0, 0.6, 0, 0, 0},
+                                                              {0, 0, 0.8, 0, 0},
+                                                              {0.1, 0.1, 0, 0, 0},
+                                                              {0, 0, 0, 0, 0.2}}}})
+                                       .get_vector()),
+                                  read_vector<float>(result),
+                                  MIN_FLOAT_TOLERANCE_BITS));
+}
+
+TEST(cpu_test, max_pool_bprop_2d_2channel_2image)
+{
+    Shape shape_a{2, 2, 5, 5};
+    Shape window_shape{2, 3};
+    auto window_movement_strides = Strides{1, 1};
+    Shape padding_below{0, 0};
+    Shape padding_above{0, 0};
+    auto A = make_shared<op::Parameter>(element::f32, shape_a);
+    Shape shape_i{2, 2, 4, 3};
+    auto delta = make_shared<op::Parameter>(element::f32, shape_i);
+
+    auto max_pool_bprop = make_shared<op::MaxPoolBackprop>(
+        A, delta, window_shape, window_movement_strides, padding_below, padding_above);
+
+    auto f = make_shared<Function>(max_pool_bprop, ParameterVector{A, delta});
+
+    auto backend = runtime::Backend::create("CPU");
+
+    // Create some tensors for input/output
+    auto a = backend->create_tensor(element::f32, shape_a);
+    copy_data(a,
+              test::NDArray<float, 4>({{{{0, 1, 0, 2, 1}, // img 0 chan 0
+                                         {0, 3, 2, 0, 0},
+                                         {2, 0, 0, 0, 1},
+                                         {2, 0, 1, 1, 2},
+                                         {0, 2, 1, 0, 0}},
+
+                                        {{0, 0, 0, 2, 0}, // img 0 chan 1
+                                         {0, 2, 3, 0, 1},
+                                         {2, 0, 1, 0, 2},
+                                         {3, 1, 0, 0, 0},
+                                         {2, 0, 0, 0, 0}}},
+
+                                       {{{0, 2, 1, 1, 0}, // img 1 chan 0
+                                         {0, 0, 2, 0, 1},
+                                         {0, 0, 1, 2, 3},
+                                         {2, 0, 0, 3, 0},
+                                         {0, 0, 0, 0, 0}},
+
+                                        {{2, 1, 0, 0, 1}, // img 1 chan 1
+                                         {0, 2, 0, 0, 0},
+                                         {1, 1, 2, 0, 2},
+                                         {1, 1, 1, 0, 1},
+                                         {1, 0, 0, 0, 2}}}})
+                  .get_vector());
+
+    auto d = backend->create_tensor(element::f32, shape_i);
+    copy_data(d,
+              test::NDArray<float, 4>({{{{0.3, 0.3, 0.2}, // img 0 chan 0
+                                         {0.3, 0.3, 0.2},
+                                         {0.2, 0.1, 0.2},
+                                         {0.2, 0.2, 0.2}},
+
+                                        {{0.3, 0.3, 0.3}, // img 0 chan 1
+                                         {0.3, 0.3, 0.3},
+                                         {0.3, 0.1, 0.2},
+                                         {0.3, 0.1, 0.4}}},
+
+                                       {{{0.2, 0.2, 0.2}, // img 1 chan 0
+                                         {0.2, 0.2, 0.3},
+                                         {0.2, 0.3, 0.3},
+                                         {0.2, 0.3, 0.3}},
+
+                                        {{0.2, 0.2, 0.1}, // img 1 chan 1
+                                         {0.2, 0.2, 0.2},
+                                         {0.2, 0.2, 0.2},
+                                         {0.1, 0.1, 0.2}}}})
+                  .get_vector());
+
+    auto result = backend->create_tensor(element::f32, shape_a);
+
+    auto handle = backend->compile(f);
+    handle->call_with_validate({result}, {a, d});
+    EXPECT_TRUE(test::all_close_f((test::NDArray<float, 4>({{{{0, 0, 0, 0.2, 0}, // img 0 chan 0
+                                                              {0, 1.2, 0.2, 0, 0},
+                                                              {0.2, 0, 0, 0, 0},
+                                                              {0.2, 0, 0.1, 0, 0.4},
+                                                              {0, 0.2, 0, 0, 0}},
+
+                                                             {{0, 0, 0, 0, 0}, // img 0 chan 1
+                                                              {0, 0, 1.8, 0, 0},
+                                                              {0, 0, 0.1, 0, 0.2},
+                                                              {0.6, 0.1, 0.4, 0, 0},
+                                                              {0, 0, 0, 0, 0}}},
+
+                                                            {{{0, 0.4, 0, 0, 0}, // img 1 chan 0
+                                                              {0, 0, 0.6, 0, 0},
+                                                              {0, 0, 0, 0, 0.6},
+                                                              {0.4, 0, 0, 0.9, 0},
+                                                              {0, 0, 0, 0, 0}},
+
+                                                             {{0.2, 0, 0, 0, 0.1}, // img 1 chan 1
+                                                              {0, 0.6, 0, 0, 0},
+                                                              {0, 0, 0.8, 0, 0},
+                                                              {0.1, 0.1, 0, 0, 0},
+                                                              {0, 0, 0, 0, 0.2}}}})
+                                       .get_vector()),
+                                  read_vector<float>(result),
+                                  MIN_FLOAT_TOLERANCE_BITS));
+}
+
+TEST(cpu_test, avg_pool_bprop_2d_2channel_2image)
+{
+    Shape shape_a{2, 2, 3, 3};
+    Shape window_shape{2, 2};
+    auto window_movement_strides = Strides{1, 1};
+    Shape padding_below{0, 0};
+    Shape padding_above{0, 0};
+    Shape shape_d{2, 2, 2, 2};
+    auto delta = make_shared<op::Parameter>(element::f32, shape_d);
+
+    auto avg_pool_bprop = make_shared<op::AvgPoolBackprop>(
+        shape_a, delta, window_shape, window_movement_strides, padding_below, padding_above, false);
+
+    auto f = make_shared<Function>(avg_pool_bprop, ParameterVector{delta});
+
+    auto backend = runtime::Backend::create("CPU");
+
+    // Create some tensors for input/output
+    auto d = backend->create_tensor(element::f32, shape_d);
+    copy_data(d,
+              test::NDArray<float, 4>({{{{0.3, 0.3}, // img 0 chan 0
+                                         {0.3, 0.3}},
+
+                                        {{0.2, 0.2}, // img 0 chan 1
+                                         {0.2, 0.2}}},
+
+                                       {{{0.1, 0.1}, // img 1 chan 0
+                                         {0.1, 0.1}},
+
+                                        {{0.4, 0.4}, // img 1 chan 1
+                                         {0.4, 0.4}}}})
+                  .get_vector());
+
+    auto result = backend->create_tensor(element::f32, shape_a);
+
+    float denom = 2 * 2;
+
+    auto handle = backend->compile(f);
+    handle->call_with_validate({result}, {d});
+    EXPECT_TRUE(test::all_close_f(
+        (test::NDArray<float, 4>({{{{0.3f / denom, 0.6f / denom, 0.3f / denom}, // img 0 chan 0
+                                    {0.6f / denom, 1.2f / denom, 0.6f / denom},
+                                    {0.3f / denom, 0.6f / denom, 0.3f / denom}},
+
+                                   {{0.2f / denom, 0.4f / denom, 0.2f / denom}, // img 0 chan 1
+                                    {0.4f / denom, 0.8f / denom, 0.4f / denom},
+                                    {0.2f / denom, 0.4f / denom, 0.2f / denom}}},
+
+                                  {{{0.1f / denom, 0.2f / denom, 0.1f / denom}, // img 1 chan 0
+                                    {0.2f / denom, 0.4f / denom, 0.2f / denom},
+                                    {0.1f / denom, 0.2f / denom, 0.1f / denom}},
+
+                                   {{0.4f / denom, 0.8f / denom, 0.4f / denom}, // img 1 chan 1
+                                    {0.8f / denom, 1.6f / denom, 0.8f / denom},
+                                    {0.4f / denom, 0.8f / denom, 0.4f / denom}}}})
+             .get_vector()),
+        read_vector<float>(result),
+        MIN_FLOAT_TOLERANCE_BITS));
+}