Create mkldnn primitives at first iteration for codegen - part2 (#2859)

* Create mkldnn primitives at first iteration for CODEGEN. OPs: add, lstm, and rnn. * OPs: batchnorm. * OPs: concat and lrn. Remove dead code. * Skip in place concat, relu, reshape, and slice when building node_primitive_string_deps_index map. * Change NGRAPH_ASSERT to NGRAPH_CHECK. * Address PR Feedback. * Create mkldnn primitives at first iteration for CODEGEN. OPs: convertlayout, relu, leakyrelu, boundedrelu, sigmoid, softmax, slice. * Fix bugs. * OPs: quantizedconcat. Check if there are descriptors before emitting code to read desc_file. * OPs: convolution backward. Use macro to write mkldnn memory dims to generated file. * OPs: MaxPoolWithIndices and MaxPoolWithIndicesBackprop. Add unit tests for MaxPoolWithIndices, MaxPoolWithIndicesBackprop, and MaxPoolBackprop. * Fix style error. * OPs: AvgPoolBackprop and MaxPoolBackprop. Add unit test for AvgPoolBackprop. * OPs: DeconvolutionBias. * OPs: Quantize and Dequantize. * OPs: QuantizedDot and QuantizedDotBias. * Use reference kernel for QuantizedConvolution for CODEGEN when mkldnn does not support the parameter types. Get scales for quantization ops in cpu_emitter. * Fix Windows build error: add CPU_BACKEND_API. * Use template for quantization ops. * OPs: QuantizedMatmul. Emit referece kernel for QuantizedDot in CODEGEN. * Remove QuantizedDot from get_scale_index. * Address PR feedback.

Create mkldnn primitives at first iteration for codegen - part2 (#2859)
* Create mkldnn primitives at first iteration for CODEGEN. OPs: add, lstm, and rnn. * OPs: batchnorm. * OPs: concat and lrn. Remove dead code. * Skip in place concat, relu, reshape, and slice when building node_primitive_string_deps_index map. * Change NGRAPH_ASSERT to NGRAPH_CHECK. * Address PR Feedback. * Create mkldnn primitives at first iteration for CODEGEN. OPs: convertlayout, relu, leakyrelu, boundedrelu, sigmoid, softmax, slice. * Fix bugs. * OPs: quantizedconcat. Check if there are descriptors before emitting code to read desc_file. * OPs: convolution backward. Use macro to write mkldnn memory dims to generated file. * OPs: MaxPoolWithIndices and MaxPoolWithIndicesBackprop. Add unit tests for MaxPoolWithIndices, MaxPoolWithIndicesBackprop, and MaxPoolBackprop. * Fix style error. * OPs: AvgPoolBackprop and MaxPoolBackprop. Add unit test for AvgPoolBackprop. * OPs: DeconvolutionBias. * OPs: Quantize and Dequantize. * OPs: QuantizedDot and QuantizedDotBias. * Use reference kernel for QuantizedConvolution for CODEGEN when mkldnn does not support the parameter types. Get scales for quantization ops in cpu_emitter. * Fix Windows build error: add CPU_BACKEND_API. * Use template for quantization ops. * OPs: QuantizedMatmul. Emit referece kernel for QuantizedDot in CODEGEN. * Remove QuantizedDot from get_scale_index. * Address PR feedback.
9335e41c · Amy Zhuang · Scott Cyphers · 30f3634e · 9335e41c · 9335e41c
Commit 9335e41c authored May 21, 2019 by Amy Zhuang Committed by Scott Cyphers May 21, 2019
11 changed files
--- a/src/ngraph/runtime/cpu/builder/convolution.cpp
+++ b/src/ngraph/runtime/cpu/builder/convolution.cpp
@@ -402,8 +402,8 @@ namespace ngraph
                        ngraph::op::ConvolutionBackpropData>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBackpropData>(node);
-                    // ConvolutionBackpropData needs 4 primitives: weights, delta, result,
-                    // and convolution_backward.
+                    // ConvolutionBackpropData needs 4 primitives: weights, diff_dst, diff_src,
+                    // and convolution_backward_data.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(4);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);

@@ -502,7 +502,7 @@ namespace ngraph
                        ngraph::op::ConvolutionBackpropFilters>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBackpropFilters>(node);
-                    // ConvolutionBackpropFilter needs 4 primitives: input, delta, weights_delta,
+                    // ConvolutionBackpropFilter needs 4 primitives: src, diff_dst, diff_weights,
                    // and convolution_backward_weights.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(4);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);
@@ -598,8 +598,8 @@ namespace ngraph
                        ngraph::op::ConvolutionBiasBackpropFiltersBias>(node);
                    auto fwd_desc = mkldnn_emitter->get_convolution_forward_desc_for_backward_op<
                        ngraph::op::ConvolutionBiasBackpropFiltersBias>(node);
-                    // ConvolutionBiasBackpropFilter needs 5 primitives: input, delta, weights_delta,
-                    // bias_delta, and convolution_backward_weights.
+                    // ConvolutionBackpropFiltersBias needs 5 primitives: src, diff_dst, diff_weights,
+                    // diff_bias, and convolution_backward_weights.
                    auto conv_index = mkldnn_emitter->reserve_primitive_space(5);
                    auto& deps = mkldnn_emitter->get_primitive_deps(conv_index);


--- a/src/ngraph/runtime/cpu/builder/max_pool.cpp
+++ b/src/ngraph/runtime/cpu/builder/max_pool.cpp
@@ -301,7 +301,7 @@ namespace ngraph
                        ->get_max_pooling_backward_desc<ngraph::op::MaxPoolWithIndicesBackprop>(
                            node);
                // MaxPoolWithIndicesBackprop needs 4 primitives: diff_dst, fprop_workspace,
-                // diff_dst, and pooling_backward.
+                // diff_src, and pooling_backward.
                size_t max_pool_index = mkldnn_emitter->reserve_primitive_space(4);
                auto& deps = mkldnn_emitter->get_primitive_deps(max_pool_index);


--- a/src/ngraph/runtime/cpu/cpu_emitter.cpp
+++ b/src/ngraph/runtime/cpu/cpu_emitter.cpp
--- a/src/ngraph/runtime/cpu/cpu_external_function.cpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.cpp
@@ -480,10 +480,7 @@ void runtime::cpu::CPU_ExternalFunction::compile(ngraph::pass::PassConfig& pass_

    // Build mkldnn primitives for codegen.
    pass_manager.register_pass<runtime::cpu::pass::MKLDNNPrimitiveBuildPass>(
-        m_desc_filename,
-        *m_mkldnn_emitter,
-        m_node_primitive_idx_map,
-        m_node_primitive_string_deps_index_map);
+        m_desc_filename, *m_mkldnn_emitter, m_node_primitive_string_deps_index_map);

    unordered_map<Node*, Node*> node_function_map;
    string common_function_string;
@@ -746,16 +743,20 @@ using namespace ngraph::runtime;
        writer << "extern \"C\" void " << current_function->get_name() << func_params << "\n";
        writer << "{\n";
        writer.indent++;
-        writer << "std::ifstream desc_file (\"" << m_desc_filename << "\", std::ios::binary);\n";

        //deserialize and build mkldnn primitives
-        writer << "if (ctx->first_iteration)\n";
-        writer.block_begin();
-        writer << "// read in memory descriptors and build mkldnn primitives\n";
-        writer << "deserialize_memory_descs_and_build_memory_primitives(" << m_desc_filename
-               << ", cg_ctx, " << to_string(m_mkldnn_emitter->get_mkldnn_descriptors_size())
-               << ");\n";
-        writer.block_end();
+        if (m_mkldnn_emitter->get_mkldnn_descriptors_size() > 0)
+        {
+            writer << "if (ctx->first_iteration)\n";
+            writer.block_begin();
+            writer << "// read in memory descriptors and build mkldnn primitives\n";
+            writer << "std::ifstream desc_file (\"" << m_desc_filename
+                   << "\", std::ios::binary);\n";
+            writer << "deserialize_memory_descs_and_build_memory_primitives(" << m_desc_filename
+                   << ", cg_ctx, " << to_string(m_mkldnn_emitter->get_mkldnn_descriptors_size())
+                   << ");\n";
+            writer.block_end();
+        }

        // Execution tracing support
        if (runtime::cpu::IsTracingEnabled() && current_function->get_name() == m_function_name)

--- a/src/ngraph/runtime/cpu/cpu_external_function.hpp
+++ b/src/ngraph/runtime/cpu/cpu_external_function.hpp
@@ -114,17 +114,6 @@ namespace ngraph
                    return m_mkldnn_emitter;
                }

-                /// Returns the index of the mkldnn primitive previously created for \p node.
-                size_t get_primitive_index(const Node* node) const
-                {
-                    auto it = m_node_primitive_idx_map.find(node);
-                    NGRAPH_CHECK(it != m_node_primitive_idx_map.end(),
-                                 "Primitive not found for node ",
-                                 node->description());
-
-                    return it->second;
-                }
-
                // Return the tuple including the string to create mkldnn primitive, the deps and the index in CODEGEN
                const std::tuple<std::string, std::vector<size_t>, size_t>&
                    get_primitive_build_tuple(const Node* node) const
@@ -328,8 +317,6 @@ namespace ngraph
                std::unordered_map<std::string, size_t> subgraph_param_indices;
 #endif

-                /// Map each node with mkldnn implementation to its mkldnn primitive index.
-                std::unordered_map<const Node*, size_t> m_node_primitive_idx_map;
                /// Map each node with mkldnn implementation to its mkldnn primitive creating string, deps, and mkldnn primitive index.
                std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>
                    m_node_primitive_string_deps_index_map;

--- a/src/ngraph/runtime/cpu/mkldnn_emitter.cpp
+++ b/src/ngraph/runtime/cpu/mkldnn_emitter.cpp
--- a/src/ngraph/runtime/cpu/mkldnn_emitter.hpp
+++ b/src/ngraph/runtime/cpu/mkldnn_emitter.hpp
--- a/src/ngraph/runtime/cpu/op/max_pool_with_indices.hpp
+++ b/src/ngraph/runtime/cpu/op/max_pool_with_indices.hpp
@@ -18,6 +18,7 @@

 #include "ngraph/graph_util.hpp"
 #include "ngraph/op/op.hpp"
+#include "ngraph/runtime/cpu/cpu_backend_visibility.h"

 namespace ngraph
 {
@@ -31,11 +32,11 @@ namespace ngraph
        class MaxPoolWithIndices : public Op
        {
        public:
-            MaxPoolWithIndices(const std::shared_ptr<Node>& arg,
-                               const Shape& window_shape,
-                               const Strides& window_movement_strides,
-                               const Shape& padding_below,
-                               const Shape& padding_above);
+            CPU_BACKEND_API MaxPoolWithIndices(const std::shared_ptr<Node>& arg,
+                                               const Shape& window_shape,
+                                               const Strides& window_movement_strides,
+                                               const Shape& padding_below,
+                                               const Shape& padding_above);

            virtual std::shared_ptr<Node>
                copy_with_new_args(const NodeVector& new_args) const override;
@@ -64,13 +65,13 @@ namespace ngraph
        class MaxPoolWithIndicesBackprop : public Op
        {
        public:
-            MaxPoolWithIndicesBackprop(const std::shared_ptr<Node>& arg_forward,
-                                       const std::shared_ptr<Node>& delta,
-                                       const std::shared_ptr<Node>& indices,
-                                       const Shape& window_shape,
-                                       const Strides& window_movement_strides,
-                                       const Shape& padding_below,
-                                       const Shape& padding_above);
+            CPU_BACKEND_API MaxPoolWithIndicesBackprop(const std::shared_ptr<Node>& arg_forward,
+                                                       const std::shared_ptr<Node>& delta,
+                                                       const std::shared_ptr<Node>& indices,
+                                                       const Shape& window_shape,
+                                                       const Strides& window_movement_strides,
+                                                       const Shape& padding_below,
+                                                       const Shape& padding_above);

            virtual std::shared_ptr<Node>
                copy_with_new_args(const NodeVector& new_args) const override;

--- a/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.cpp
+++ b/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.cpp
--- a/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.hpp
+++ b/src/ngraph/runtime/cpu/pass/cpu_mkldnn_primitive_build.hpp
@@ -23,10 +23,6 @@
 #include <typeindex>
 #include <unordered_map>

-#define BUILD_PRIMITIVE_DECL(op_name)                                                              \
-    build_primitive<op_name>(ngraph::runtime::cpu::MKLDNNEmitter & mkldnn_emitter,                 \
-                             ngraph::Node * node)
-
 #define CONSTRUCT_PRIMITIVE_BUILD_STRING_DECL(op_name)                                             \
    construct_primitive_build_string<op_name>(ngraph::runtime::cpu::MKLDNNEmitter &                \
                                                  mkldnn_emitter,                                  \
@@ -53,11 +49,6 @@ namespace ngraph

            namespace pass
            {
-                using PrimitiveBuildFunction =
-                    std::function<size_t(ngraph::runtime::cpu::MKLDNNEmitter&, ngraph::Node*)>;
-                using PrimitiveBuildOpMap =
-                    std::unordered_map<std::type_index, PrimitiveBuildFunction>;
-
                using PrimitiveBuildStringConstructFunction =
                    std::function<void(ngraph::runtime::cpu::MKLDNNEmitter&,
                                       ngraph::Node*,
@@ -77,10 +68,6 @@ namespace ngraph

                    ngraph::runtime::cpu::MKLDNNEmitter& m_mkldnn_emitter;

-                    /// External map to store each node with mkldnn implementation and its mkldnn
-                    /// associated primitive index.
-                    std::unordered_map<const Node*, size_t>& m_node_primitive_idx_map;
-
                    /// External map to store each node with mkldnn implementation and its mkldnn
                    /// creation string, deps, and mkldnn primitive index.
                    std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>&
@@ -90,12 +77,10 @@ namespace ngraph
                    MKLDNNPrimitiveBuildPass(
                        std::string filename,
                        ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,
-                        std::unordered_map<const Node*, size_t>& node_primitive_idx_map,
                        std::map<const Node*, std::tuple<std::string, std::vector<size_t>, size_t>>&
                            node_primitive_string_deps_index_map)
                        : m_desc_filename(filename)
                        , m_mkldnn_emitter(mkldnn_emitter)
-                        , m_node_primitive_idx_map(node_primitive_idx_map)
                        , m_node_primitive_string_deps_index_map(
                              node_primitive_string_deps_index_map)
                    {
@@ -103,15 +88,6 @@ namespace ngraph

                    bool run_on_call_graph(const std::list<std::shared_ptr<Node>>& nodes) override;

-                    template <typename OP>
-                    static size_t
-                        build_primitive(ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,
-                                        ngraph::Node* node)
-                    {
-                        throw std::runtime_error("Unimplemented op '" + node->description() +
-                                                 "' in MKLDNNPrimitiveBuildPass");
-                    }
-
                    template <typename OP>
                    static void construct_primitive_build_string(
                        ngraph::runtime::cpu::MKLDNNEmitter& mkldnn_emitter,

--- a/test/cpu_test.cpp
+++ b/test/cpu_test.cpp