TensorIterator (#3038)

* TensorIterator * ssize_t is not on windows * RNN building test * simplify * Simplify output * typo * typos * remove arg * Sequence version * style * Serialization for all but TensorIterator * Add ops for igpu * style * typo, ngpu * missing headers, output vector * Fix const json issues * TensorIterator serialization * Serialization for TensorIterator Switch Outout<T> to use shared_ptr do nodes don't vanish Switch Result to new node style Add serialization/deserialization to test * Switch Output to use a shared_ptr to prevent nodes from disappearing early. * Eliminate wrapped enum Switch allreduce to new op form * Convert to new op form * Disambiguate concat * Add autobroadcast for SequencePush Add validation for SequencePush * compute shapes for SequenceRepeat * Add explicit conversion from PartialShape to dimension vector validate and infer types for SliceInput * validate and infer types for SequenceOutput * Add sequence attributes * Move test to serializer so it doesn't fail when there is no serializer? * const arg * Beginning of TensorIterator validation * Validation up to parameters * Fix shape in test * Remove mis-typed AxisSet * Simplify, add doc * Review comments * Tweaks * free/bound * Try fused op * Discussion * more * comments * Start of LSTMCell test * Add LSTMCell example * Reorg * Reorg * Fused ops don't need handlers * Serialization * Use `as_type` and `is_type` for up-conversions of descriptions Allocate output space for each output * Clean up type checking * Fix ser/deser issues * Refactor, cleanup type info to make it safer to use for non-ops * Implement validate_and_infer_types and modify unit tests. * For ops in the loop body: revalidate and infer types. Nested loop is not supported. * Put body ops in a set and call revalidate and infer types on the set. * Set slice[axis] to part_size. Call set_partial_shape to set shape for body parameters. Add more unit tests. * Give tensor iterator body a lambda * Update validate_and_infer_types and unit tests. * Serialization of body * Change static function to TensorIterator function. * review comments

TensorIterator (#3038)
* TensorIterator * ssize_t is not on windows * RNN building test * simplify * Simplify output * typo * typos * remove arg * Sequence version * style * Serialization for all but TensorIterator * Add ops for igpu * style * typo, ngpu * missing headers, output vector * Fix const json issues * TensorIterator serialization * Serialization for TensorIterator Switch Outout<T> to use shared_ptr do nodes don't vanish Switch Result to new node style Add serialization/deserialization to test * Switch Output to use a shared_ptr to prevent nodes from disappearing early. * Eliminate wrapped enum Switch allreduce to new op form * Convert to new op form * Disambiguate concat * Add autobroadcast for SequencePush Add validation for SequencePush * compute shapes for SequenceRepeat * Add explicit conversion from PartialShape to dimension vector validate and infer types for SliceInput * validate and infer types for SequenceOutput * Add sequence attributes * Move test to serializer so it doesn't fail when there is no serializer? * const arg * Beginning of TensorIterator validation * Validation up to parameters * Fix shape in test * Remove mis-typed AxisSet * Simplify, add doc * Review comments * Tweaks * free/bound * Try fused op * Discussion * more * comments * Start of LSTMCell test * Add LSTMCell example * Reorg * Reorg * Fused ops don't need handlers * Serialization * Use `as_type` and `is_type` for up-conversions of descriptions Allocate output space for each output * Clean up type checking * Fix ser/deser issues * Refactor, cleanup type info to make it safer to use for non-ops * Implement validate_and_infer_types and modify unit tests. * For ops in the loop body: revalidate and infer types. Nested loop is not supported. * Put body ops in a set and call revalidate and infer types on the set. * Set slice[axis] to part_size. Call set_partial_shape to set shape for body parameters. Add more unit tests. * Give tensor iterator body a lambda * Update validate_and_infer_types and unit tests. * Serialization of body * Change static function to TensorIterator function. * review comments
4a25881e · Scott Cyphers · GitHub · ccbba5e4 · 4a25881e · 4a25881e
Unverified Commit 4a25881e authored Nov 06, 2019 by Scott Cyphers Committed by GitHub Nov 06, 2019
15 changed files
--- a/src/ngraph/CMakeLists.txt
+++ b/src/ngraph/CMakeLists.txt
@@ -83,6 +83,8 @@ set (SRC
    function.cpp
    function.hpp
    graph_util.cpp
+    lambda.cpp
+    lambda.hpp
    log.cpp
    log.hpp
    ngraph.cpp
@@ -315,6 +317,8 @@ set (SRC
    op/tan.hpp
    op/tanh.cpp
    op/tanh.hpp
+    op/tensor_iterator.cpp
+    op/tensor_iterator.hpp
    op/topk.cpp
    op/topk.hpp
    op/xor.cpp

--- a/src/ngraph/function.cpp
+++ b/src/ngraph/function.cpp
@@ -26,13 +26,14 @@
 using namespace std;
 using namespace ngraph;

+constexpr DiscreteTypeInfo Function::type_info;
+
 atomic<size_t> Function::m_next_instance_id(0);

 Function::Function(const ResultVector& results,
                   const ParameterVector& parameters,
                   const std::string& name)
-    : m_results(results)
-    , m_parameters(parameters)
+    : Lambda(results, parameters)
    , m_temporary_pool_size(0)
    , m_instance_id(m_next_instance_id.fetch_add(1))
    , m_name(name)
@@ -44,48 +45,24 @@ Function::Function(const ResultVector& results,
 Function::Function(const OutputVector& results,
                   const ParameterVector& parameters,
                   const std::string& name)
-    : m_results(results.size())
-    , m_parameters(parameters)
+    : Lambda(results, parameters)
    , m_temporary_pool_size(0)
    , m_instance_id(m_next_instance_id.fetch_add(1))
    , m_name(name)
    , m_unique_name("Function_" + to_string(m_instance_id))
 {
-    if (std::any_of(results.cbegin(), results.cend(), [](Output<Node> n) {
-            return as_type_ptr<op::Result>(n.get_node_shared_ptr());
-        }))
-    {
-        throw ngraph_error(
-            " Results already contain op::Results. Use a c-tor that takes a ResultVector");
-    }
-
-    std::transform(results.begin(), results.end(), m_results.begin(), [](Output<Node> n) {
-        return std::make_shared<op::Result>(n);
-    });
    init();
 }

 Function::Function(const NodeVector& results,
                   const ParameterVector& parameters,
                   const std::string& name)
-    : m_results(results.size())
-    , m_parameters(parameters)
+    : Lambda(as_output_vector(results), parameters)
    , m_temporary_pool_size(0)
    , m_instance_id(m_next_instance_id.fetch_add(1))
    , m_name(name)
    , m_unique_name("Function_" + to_string(m_instance_id))
 {
-    if (std::any_of(results.cbegin(), results.cend(), [](std::shared_ptr<Node> n) {
-            return as_type_ptr<op::Result>(n);
-        }))
-    {
-        throw ngraph_error(
-            " Results already contain op::Results. Use a c-tor that takes a ResultVector");
-    }
-
-    std::transform(results.begin(), results.end(), m_results.begin(), [](std::shared_ptr<Node> n) {
-        return std::make_shared<op::Result>(n);
-    });
    init();
 }


--- a/src/ngraph/function.hpp
+++ b/src/ngraph/function.hpp
@@ -23,6 +23,7 @@
 #include <string>
 #include <vector>

+#include "ngraph/lambda.hpp"
 #include "ngraph/node.hpp"
 #include "ngraph/op/parameter.hpp"
 #include "ngraph/op/result.hpp"
@@ -30,9 +31,11 @@
 namespace ngraph
 {
    /// A user-defined function.
-    class Function
+    class Function : public Lambda
    {
    public:
+        static constexpr DiscreteTypeInfo type_info{"Function", 0};
+        const DiscreteTypeInfo& get_type_info() const { return type_info; }
        Function(const NodeVector& results,
                 const ParameterVector& parameters,
                 const std::string& name = "");
@@ -70,10 +73,6 @@ namespace ngraph
        /// Return the partial shape of element i
        const PartialShape& get_output_partial_shape(size_t i) const;

-        /// Return the function parameters
-        const ParameterVector& get_parameters() const { return m_parameters; }
-        /// Return a list of function's outputs
-        const ResultVector& get_results() const { return m_results; }
        /// Check that there is a single result and return it.
        std::shared_ptr<Node> get_result() const;

@@ -128,8 +127,6 @@ namespace ngraph
                               const std::shared_ptr<op::Parameter>& parameter);

    protected:
-        ResultVector m_results;
-        ParameterVector m_parameters;
        size_t m_temporary_pool_size;

    private:

--- a/src/ngraph/lambda.cpp
+++ b/src/ngraph/lambda.cpp
+//*****************************************************************************
+// Copyright 2017-2019 Intel Corporation
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//*****************************************************************************
+
+#include "ngraph/lambda.hpp"
+
+using namespace std;
+using namespace ngraph;
+
+constexpr DiscreteTypeInfo Lambda::type_info;
+
+Lambda::Lambda(const OutputVector& results, const ParameterVector& parameters)
+    : Lambda(as_result_vector(results), parameters)
+{
+}
+
+Lambda::Lambda(const ResultVector& results, const ParameterVector& parameters)
+    : m_results(results)
+    , m_parameters(parameters)
+{
+}
+
+int64_t Lambda::get_parameter_index(const std::shared_ptr<op::Parameter>& parameter) const
+{
+    int64_t pos = 0;
+    for (auto p : get_parameters())
+    {
+        if (p == parameter)
+        {
+            return pos;
+        }
+        pos++;
+    }
+    return -1;
+}
+
+int64_t Lambda::get_result_index(const Output<Node>& value) const
+{
+    int64_t pos = 0;
+    if (is_type<op::Result>(value.get_node_shared_ptr()))
+    {
+        auto result = value.get_node_shared_ptr();
+        for (auto r : get_results())
+        {
+            if (r == result)
+            {
+                return pos;
+            }
+            pos++;
+        }
+    }
+    else
+    {
+        for (auto r : get_results())
+        {
+            if (r->input_value(0) == value)
+            {
+                return pos;
+            }
+            pos++;
+        }
+    }
+    return -1;
+}
--- a/src/ngraph/lambda.hpp
+++ b/src/ngraph/lambda.hpp
+//*****************************************************************************
+// Copyright 2017-2019 Intel Corporation
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//*****************************************************************************
+
+#pragma once
+
+#include "ngraph/node.hpp"
+#include "ngraph/op/parameter.hpp"
+#include "ngraph/op/result.hpp"
+
+namespace ngraph
+{
+    class Lambda
+    {
+    public:
+        static constexpr DiscreteTypeInfo type_info{"Lamdba", 0};
+        const DiscreteTypeInfo& get_type_info() const { return type_info; }
+        /// Return the function parameters
+        const ParameterVector& get_parameters() const { return m_parameters; };
+        /// Index for parameter, or -1
+        int64_t get_parameter_index(const std::shared_ptr<op::Parameter>& parameter) const;
+        /// Return a list of function's outputs
+        const ResultVector& get_results() const { return m_results; };
+        /// Index for value or result referencing it, or -1
+        int64_t get_result_index(const Output<Node>& value) const;
+
+    protected:
+        Lambda(const ResultVector& results, const ParameterVector& parameters);
+        Lambda(const OutputVector& results, const ParameterVector& parameters);
+
+        ResultVector m_results;
+        ParameterVector m_parameters;
+    };
+}
--- a/src/ngraph/ngraph.hpp
+++ b/src/ngraph/ngraph.hpp
@@ -81,6 +81,7 @@ namespace ngraph
 #include "ngraph/dimension.hpp"
 #include "ngraph/except.hpp"
 #include "ngraph/function.hpp"
+#include "ngraph/lambda.hpp"
 #include "ngraph/node.hpp"
 #include "ngraph/op/abs.hpp"
 #include "ngraph/op/acos.hpp"
@@ -209,6 +210,7 @@ namespace ngraph
 #include "ngraph/op/sum.hpp"
 #include "ngraph/op/tan.hpp"
 #include "ngraph/op/tanh.hpp"
+#include "ngraph/op/tensor_iterator.hpp"
 #include "ngraph/op/topk.hpp"
 #include "ngraph/op/util/attr_types.hpp"
 #include "ngraph/op/xor.hpp"

--- a/src/ngraph/node.cpp
+++ b/src/ngraph/node.cpp
@@ -817,6 +817,18 @@ NodeVector ngraph::as_node_vector(const OutputVector& values)
    return node_vector;
 }

+ResultVector ngraph::as_result_vector(const OutputVector& values)
+{
+    ResultVector result;
+    for (auto value : values)
+    {
+        shared_ptr<Node> node = value.get_node_shared_ptr();
+        result.push_back(is_type<op::Result>(node) ? as_type_ptr<op::Result>(node)
+                                                   : make_shared<op::Result>(value));
+    }
+    return result;
+}
+
 std::tuple<element::Type, PartialShape>
    Node::validate_and_infer_elementwise_args(const op::AutoBroadcastSpec& autob)
 {

--- a/src/ngraph/node.hpp
+++ b/src/ngraph/node.hpp
@@ -61,8 +61,11 @@ namespace ngraph
    {
        struct AutoBroadcastSpec;
        class Constant;
+        class Result;
    } // namespace op

+    using ResultVector = std::vector<std::shared_ptr<op::Result>>;
+
    namespace autodiff
    {
        class Adjoints;
@@ -80,6 +83,8 @@ namespace ngraph

    OutputVector as_output_vector(const NodeVector& args);
    NodeVector as_node_vector(const OutputVector& values);
+    /// Returns a ResultVector referencing values.
+    ResultVector as_result_vector(const OutputVector& values);

    /// Alias useful for cloning
    using NodeMap = std::unordered_map<ngraph::Node*, std::shared_ptr<ngraph::Node>>;

--- a/src/ngraph/op/fused/lstm_cell.cpp
+++ b/src/ngraph/op/fused/lstm_cell.cpp
@@ -141,7 +141,7 @@ void op::LSTMCell::pre_validate_and_infer_types()
                          ", ",
                          get_hidden_size(),
                          "). Actual shape is:",
-                          w_shape,
+                          r_shape,
                          ".");
    NODE_VALIDATION_CHECK(this,
                          (ht_shape == Shape{batch_size, get_hidden_size()}),
@@ -150,7 +150,7 @@ void op::LSTMCell::pre_validate_and_infer_types()
                          ", ",
                          get_hidden_size(),
                          "). Actual shape is:",
-                          w_shape,
+                          ht_shape,
                          ".");
    NODE_VALIDATION_CHECK(this,
                          (ct_shape == Shape{batch_size, get_hidden_size()}),
@@ -159,7 +159,7 @@ void op::LSTMCell::pre_validate_and_infer_types()
                          ", ",
                          get_hidden_size(),
                          "). Actual shape is:",
-                          w_shape,
+                          ct_shape,
                          ".");

    const auto& b_pshape = get_input_partial_shape(5);

--- a/src/ngraph/op/fused_op_tbl.hpp
+++ b/src/ngraph/op/fused_op_tbl.hpp
@@ -57,4 +57,5 @@ NGRAPH_OP(SquaredDifference, ngraph::op)
 NGRAPH_OP(SoftmaxCrossEntropy, ngraph::op)
 NGRAPH_OP(SoftmaxCrossEntropyBackprop, ngraph::op)
 NGRAPH_OP(Squeeze, ngraph::op)
+NGRAPH_OP(TensorIterator, ngraph::op)
 NGRAPH_OP(Unsqueeze, ngraph::op)
--- a/src/ngraph/op/tensor_iterator.cpp
+++ b/src/ngraph/op/tensor_iterator.cpp
+//*****************************************************************************
+// Copyright 2017-2019 Intel Corporation
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//*****************************************************************************
+
+#include "ngraph/op/tensor_iterator.hpp"
+#include "ngraph/graph_util.hpp"
+
+using namespace std;
+using namespace ngraph;
+
+constexpr NodeTypeInfo op::TensorIterator::type_info;
+
+constexpr DiscreteTypeInfo op::TensorIterator::SliceInputDescription::type_info;
+constexpr DiscreteTypeInfo op::TensorIterator::MergedInputDescription::type_info;
+constexpr DiscreteTypeInfo op::TensorIterator::InvariantInputDescription::type_info;
+
+constexpr DiscreteTypeInfo op::TensorIterator::BodyOutputDescription::type_info;
+constexpr DiscreteTypeInfo op::TensorIterator::ConcatOutputDescription::type_info;
+
+constexpr DiscreteTypeInfo op::TensorIterator::BodyLambda::type_info;
+
+op::TensorIterator::TensorIterator(const OutputVector& values)
+    : op::util::FusedOp(values)
+{
+}
+
+op::TensorIterator::InputDescription::InputDescription(uint64_t input_index,
+                                                       uint64_t body_parameter_index)
+    : m_input_index(input_index)
+    , m_body_parameter_index(body_parameter_index)
+{
+}
+
+op::TensorIterator::SliceInputDescription::SliceInputDescription(uint64_t input_index,
+                                                                 uint64_t body_parameter_index,
+                                                                 int64_t start,
+                                                                 int64_t stride,
+                                                                 int64_t part_size,
+                                                                 int64_t end,
+                                                                 int64_t axis)
+    : InputDescription(input_index, body_parameter_index)
+    , m_start(start)
+    , m_stride(stride)
+    , m_part_size(part_size)
+    , m_end(end)
+    , m_axis(axis)
+{
+}
+
+shared_ptr<op::TensorIterator::InputDescription>
+    op::TensorIterator::SliceInputDescription::copy() const
+{
+    return make_shared<SliceInputDescription>(
+        m_input_index, m_body_parameter_index, m_start, m_stride, m_part_size, m_end, m_axis);
+}
+
+op::TensorIterator::MergedInputDescription::MergedInputDescription(uint64_t input_index,
+                                                                   uint64_t body_parameter_index,
+                                                                   uint64_t body_value_index)
+    : InputDescription(input_index, body_parameter_index)
+    , m_body_value_index(body_value_index)
+{
+}
+
+shared_ptr<op::TensorIterator::InputDescription>
+    op::TensorIterator::MergedInputDescription::copy() const
+{
+    return make_shared<MergedInputDescription>(
+        m_input_index, m_body_parameter_index, m_body_value_index);
+}
+
+op::TensorIterator::InvariantInputDescription::InvariantInputDescription(
+    uint64_t input_index, uint64_t body_parameter_index)
+    : InputDescription(input_index, body_parameter_index)
+{
+}
+
+shared_ptr<op::TensorIterator::InputDescription>
+    op::TensorIterator::InvariantInputDescription::copy() const
+{
+    return make_shared<InvariantInputDescription>(m_input_index, m_body_parameter_index);
+}
+
+op::TensorIterator::OutputDescription::OutputDescription(uint64_t body_value_index,
+                                                         uint64_t output_index)
+    : m_body_value_index(body_value_index)
+    , m_output_index(output_index)
+{
+}
+
+op::TensorIterator::ConcatOutputDescription::ConcatOutputDescription(uint64_t body_value_index,
+                                                                     uint64_t output_index,
+                                                                     int64_t start,
+                                                                     int64_t stride,
+                                                                     int64_t part_size,
+                                                                     int64_t end,
+                                                                     int64_t axis)
+    : OutputDescription(body_value_index, output_index)
+    , m_start(start)
+    , m_stride(stride)
+    , m_part_size(part_size)
+    , m_end(end)
+    , m_axis(axis)
+{
+}
+
+shared_ptr<op::TensorIterator::OutputDescription>
+    op::TensorIterator::ConcatOutputDescription::copy() const
+{
+    return make_shared<ConcatOutputDescription>(
+        m_body_value_index, m_output_index, m_start, m_stride, m_part_size, m_end, m_axis);
+}
+
+op::TensorIterator::BodyOutputDescription::BodyOutputDescription(uint64_t body_value_index,
+                                                                 uint64_t output_index,
+                                                                 int64_t iteration)
+    : OutputDescription(body_value_index, output_index)
+    , m_iteration(iteration)
+{
+}
+
+shared_ptr<op::TensorIterator::OutputDescription>
+    op::TensorIterator::BodyOutputDescription::copy() const
+{
+    return make_shared<BodyOutputDescription>(m_body_value_index, m_output_index, m_iteration);
+}
+
+Input<Node> op::TensorIterator::input_for_value(const Output<Node>& value)
+{
+    for (auto input : inputs())
+    {
+        if (input.get_source_output() == value)
+        {
+            return input;
+        }
+    }
+    auto input_index = get_input_size();
+    set_argument(input_index, value);
+    return Input<Node>(this, input_index);
+}
+
+void op::TensorIterator::set_sliced_input(const std::shared_ptr<op::Parameter>& body_parameter,
+                                          const Output<Node>& value,
+                                          int64_t start,
+                                          int64_t stride,
+                                          int64_t part_size,
+                                          int64_t end,
+                                          int64_t axis)
+{
+    m_input_descriptions.push_back(
+        make_shared<SliceInputDescription>(input_for_value(value).get_index(),
+                                           m_body->get_parameter_index(body_parameter),
+                                           start,
+                                           stride,
+                                           part_size,
+                                           end,
+                                           axis));
+}
+
+void op::TensorIterator::set_merged_input(const std::shared_ptr<Parameter>& body_parameter,
+                                          const Output<Node>& initial_value,
+                                          const Output<Node>& successive_value)
+{
+    m_input_descriptions.push_back(
+        make_shared<MergedInputDescription>(input_for_value(initial_value).get_index(),
+                                            m_body->get_parameter_index(body_parameter),
+                                            m_body->get_result_index(successive_value)));
+}
+
+void op::TensorIterator::set_invariant_input(const std::shared_ptr<Parameter>& body_parameter,
+                                             const Output<Node>& value)
+{
+    m_input_descriptions.push_back(make_shared<InvariantInputDescription>(
+        input_for_value(value).get_index(), m_body->get_parameter_index(body_parameter)));
+}
+
+Output<Node> op::TensorIterator::get_iter_value(const Output<Node>& body_value, int64_t iteration)
+{
+    auto output_index = get_output_size();
+    m_output_descriptions.push_back(make_shared<BodyOutputDescription>(
+        m_body->get_result_index(body_value), output_index, iteration));
+    set_output_size(output_index + 1);
+    return Output<Node>(shared_from_this(), output_index);
+}
+
+Output<Node> op::TensorIterator::get_concatenated_slices(const Output<Node>& body_value,
+                                                         int64_t start,
+                                                         int64_t stride,
+                                                         int64_t part_size,
+                                                         int64_t end,
+                                                         int64_t axis)
+{
+    auto output_index = get_output_size();
+    m_output_descriptions.push_back(make_shared<ConcatOutputDescription>(
+        m_body->get_result_index(body_value), output_index, start, stride, part_size, end, axis));
+    set_output_size(output_index + 1);
+    return Output<Node>(shared_from_this(), output_index);
+}
+
+NodeVector op::TensorIterator::decompose_op() const
+{
+    // Stub
+    return NodeVector{};
+}
+
+void op::TensorIterator::revalidate_and_infer_types_for_body_ops()
+{
+    std::stack<std::shared_ptr<Node>, std::vector<std::shared_ptr<Node>>> nodes_to_do;
+    std::unordered_set<std::shared_ptr<Node>> nodes_done;
+
+    for (auto r : m_body->get_results())
+    {
+        nodes_to_do.push(r);
+    }
+    while (nodes_to_do.size() > 0)
+    {
+        auto node = nodes_to_do.top();
+        if (nodes_done.count(node) == 0)
+        {
+            NGRAPH_CHECK(as_type_ptr<op::TensorIterator>(node) == nullptr,
+                         "No nested TensorIterator");
+            bool can_add = true;
+            size_t arg_count = node->get_input_size();
+            for (size_t i = 0; i < arg_count; ++i)
+            {
+                auto dep = node->input(arg_count - i - 1)
+                               .get_source_output()
+                               .get_node()
+                               ->shared_from_this();
+                if (nodes_done.count(dep) == 0)
+                {
+                    can_add = false;
+                    nodes_to_do.push(dep);
+                }
+            }
+            if (can_add)
+            {
+                nodes_done.insert(node);
+                node->revalidate_and_infer_types();
+                nodes_to_do.pop();
+            }
+        }
+        else
+        {
+            nodes_to_do.pop();
+        }
+    }
+}
+
+void op::TensorIterator::validate_and_infer_types()
+{
+    NODE_VALIDATION_CHECK(this,
+                          get_input_size() == m_input_descriptions.size(),
+                          "Number of inputs must be the same as number of input descriptions");
+
+    NODE_VALIDATION_CHECK(this,
+                          get_output_size() == m_output_descriptions.size(),
+                          "Number of outputs must be the same as number of output descriptions");
+
+    std::vector<std::shared_ptr<Node>> ends;
+
+    // Input
+    uint64_t index_it = 0;
+    for (auto input_description : m_input_descriptions)
+    {
+        auto index = input_description->m_input_index;
+        NODE_VALIDATION_CHECK(this, index == index_it, "Input_index not in order");
+        index_it++;
+
+        if (auto slice_input_description = as_type_ptr<SliceInputDescription>(input_description))
+        {
+            auto body_parameter =
+                m_body->get_parameters().at(slice_input_description->m_body_parameter_index);
+            auto body_param_partial_shape = body_parameter->get_partial_shape();
+            auto input_partial_shape = inputs().at(index).get_source_output().get_partial_shape();
+            auto start = slice_input_description->m_start;
+            auto part_size = slice_input_description->m_part_size;
+            auto end = slice_input_description->m_end;
+            if (end != -1)
+            {
+                if (m_num_iterations == -1)
+                {
+                    m_num_iterations = end - start;
+                }
+                else
+                {
+                    NODE_VALIDATION_CHECK(
+                        this, m_num_iterations == end - start, "Number of slices not the same");
+                }
+            }
+
+            if (input_partial_shape.is_static())
+            {
+                auto input_shape = input_partial_shape.to_shape();
+                auto axis = slice_input_description->m_axis;
+                if (end == -1)
+                {
+                    // for simple RNN case where stride is the same as part_size
+                    // when end is -1, we assume that we slice the input from "start" to the very
+                    // end.
+                    end = static_cast<size_t>(input_shape[axis]) / part_size + start;
+                    if (m_num_iterations == -1)
+                    {
+                        m_num_iterations = end - start;
+                    }
+                    else
+                    {
+                        NODE_VALIDATION_CHECK(
+                            this, m_num_iterations == end - start, "Number of slices not the same");
+                    }
+                }
+
+                if (body_param_partial_shape.is_static())
+                {
+                    // validate
+                    auto body_param_shape = body_param_partial_shape.to_shape();
+                    for (auto i = 0; i < input_shape.size(); i++)
+                    {
+                        if (i != axis)
+                        {
+                            NODE_VALIDATION_CHECK(
+                                this,
+                                input_shape[i] == body_param_shape[i],
+                                "Iterator input is not compatible with body param");
+                        }
+                    }
+                }
+                else
+                {
+                    // infer type for m_body_parameter
+                    Shape out_shape{input_shape};
+                    out_shape[axis] = part_size;
+                    body_parameter->set_partial_shape(out_shape);
+                }
+            }
+        }
+        else if (auto merged_input_description =
+                     as_type_ptr<MergedInputDescription>(input_description))
+        {
+            auto body_value =
+                m_body->get_results().at(merged_input_description->m_body_value_index)->input(0);
+            ends.push_back(body_value.get_node()->shared_from_this());
+
+            auto body_value_partial_shape = body_value.get_partial_shape();
+            auto body_parameter =
+                m_body->get_parameters().at(merged_input_description->m_body_parameter_index);
+
+            auto body_param_partial_shape = body_parameter->get_partial_shape();
+            auto input_partial_shape = inputs().at(index).get_source_output().get_partial_shape();
+            NODE_VALIDATION_CHECK(this,
+                                  body_value_partial_shape.compatible(body_param_partial_shape),
+                                  "Iterator successive value is not compatible with body param");
+            NODE_VALIDATION_CHECK(this,
+                                  input_partial_shape.compatible(body_param_partial_shape),
+                                  "Iterator initial value is not compatible with body param");
+
+            if (input_partial_shape.is_static())
+            {
+                auto input_shape = input_partial_shape.to_shape();
+                // infer type for body_parameter
+                if (body_param_partial_shape.is_dynamic())
+                {
+                    body_parameter->set_partial_shape(input_shape);
+                }
+            }
+        }
+        else if (auto invariant_input_description =
+                     as_type_ptr<InvariantInputDescription>(input_description))
+        {
+            auto body_parameter =
+                m_body->get_parameters().at(invariant_input_description->m_body_parameter_index);
+
+            auto body_param_partial_shape = body_parameter->get_partial_shape();
+            auto input_partial_shape = inputs().at(index).get_source_output().get_partial_shape();
+            NODE_VALIDATION_CHECK(this,
+                                  input_partial_shape.compatible(body_param_partial_shape),
+                                  "Iterator initial value is not compatible with body param");
+
+            if (input_partial_shape.is_static())
+            {
+                auto input_shape = input_partial_shape.to_shape();
+                // infer type for m_body_parameter
+                if (body_param_partial_shape.is_dynamic())
+                {
+                    body_parameter->set_partial_shape(input_shape);
+                }
+            }
+        }
+    }
+
+    // Body
+    revalidate_and_infer_types_for_body_ops();
+
+    // Output
+    index_it = 0;
+    for (auto output_description : m_output_descriptions)
+    {
+        auto index = output_description->m_output_index;
+        NODE_VALIDATION_CHECK(this, index == index_it, "Output_index not in order");
+        index_it++;
+
+        auto body_value =
+            m_body->get_results().at(output_description->m_body_value_index)->input_value(0);
+
+        if (auto concat_output_description =
+                as_type_ptr<ConcatOutputDescription>(output_description))
+        {
+            auto body_value_partial_shape = body_value.get_partial_shape();
+            if (body_value_partial_shape.is_static())
+            {
+                auto body_value_shape = body_value_partial_shape.to_shape();
+                auto start = concat_output_description->m_start;
+                auto part_size = concat_output_description->m_part_size;
+                auto end = concat_output_description->m_end;
+                auto axis = concat_output_description->m_axis;
+                Shape out_shape{body_value_shape};
+                if (end != -1)
+                {
+                    if (m_num_iterations != -1)
+                    {
+                        NODE_VALIDATION_CHECK(
+                            this, m_num_iterations == end - start, "Number of slices not the same");
+                    }
+                    else
+                    {
+                        m_num_iterations = end - start;
+                    }
+                }
+                if (m_num_iterations != -1)
+                {
+                    // for simple RNN case where stride is the same as part_size
+                    out_shape[axis] = m_num_iterations * part_size;
+                    set_output_type(index, body_value.get_element_type(), out_shape);
+                }
+            }
+        }
+        else if (auto body_output_description =
+                     as_type_ptr<BodyOutputDescription>(output_description))
+        {
+            set_output_type(index, body_value.get_element_type(), body_value.get_partial_shape());
+        }
+    }
+}
+
+std::shared_ptr<Node> op::TensorIterator::copy_with_new_args(const NodeVector& new_args) const
+{
+    auto op = make_shared<op::TensorIterator>(as_output_vector(new_args));
+    for (auto& input_description : m_input_descriptions)
+    {
+        op->m_input_descriptions.push_back(input_description->copy());
+    }
+    for (auto& output_description : m_output_descriptions)
+    {
+        op->m_output_descriptions.push_back(output_description->copy());
+    }
+    return move(op);
+}
--- a/src/ngraph/op/tensor_iterator.hpp
+++ b/src/ngraph/op/tensor_iterator.hpp
+//*****************************************************************************
+// Copyright 2017-2019 Intel Corporation
+//
+// Licensed under the Apache License, Version 2.0 (the "License");
+// you may not use this file except in compliance with the License.
+// You may obtain a copy of the License at
+//
+//     http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+//*****************************************************************************
+
+#pragma once
+
+#include <vector>
+
+#include "ngraph/lambda.hpp"
+#include "ngraph/op/parameter.hpp"
+#include "ngraph/op/util/fused_op.hpp"
+
+namespace ngraph
+{
+    namespace op
+    {
+        /// \brief  Iterate a body over tensors, accumulating into tensors.
+        class TensorIterator : public util::FusedOp
+        {
+        public:
+            NGRAPH_API
+            static constexpr NodeTypeInfo type_info{"TensorIterator", 0};
+            const NodeTypeInfo& get_type_info() const override { return type_info; }
+            // Forward declarations
+            class SliceInputDescription;
+            class MergedInputDescription;
+            class InvariantInputDescription;
+
+            TensorIterator() = default;
+            TensorIterator(const OutputVector& values);
+
+            class BodyLambda : public Lambda
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"BodyLamdba", 0};
+                const DiscreteTypeInfo& get_type_info() const { return type_info; }
+                BodyLambda(const OutputVector& outputs, const ParameterVector& parameters)
+                    : Lambda(outputs, parameters)
+                {
+                }
+                BodyLambda(const ResultVector& results, const ParameterVector& parameters)
+                    : Lambda(results, parameters)
+                {
+                }
+            };
+
+            /// \brief Describes a connection between a TensorIterator input and the body.
+            class InputDescription
+            {
+            protected:
+                /// \param input_index Position of the TensorIterator input
+                /// \param body_parameter Body parameter to receive input
+                InputDescription(uint64_t input_index, uint64_t body_parameter_index);
+
+            public:
+                virtual ~InputDescription() {}
+                virtual std::shared_ptr<InputDescription> copy() const = 0;
+
+                virtual const DiscreteTypeInfo& get_type_info() const = 0;
+
+                uint64_t m_input_index;
+                uint64_t m_body_parameter_index;
+            };
+
+            /// \brief Describes a body input formed from slices of an input to TensorIterator.
+            class SliceInputDescription : public InputDescription
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"SliceInputDescription", 0};
+                const DiscreteTypeInfo& get_type_info() const override { return type_info; }
+                /// \param input_index Position of the TensorIterator input
+                /// \param body_parameter_index Body parameter position to receive input
+                /// \param start First index for slices
+                /// \param stride Step amount for slices
+                /// \param part_size Width of slices
+                /// \param end Last index for slices
+                /// \param axis Axis being sliced
+                SliceInputDescription(uint64_t input_index,
+                                      uint64_t body_parameter_index,
+                                      int64_t start,
+                                      int64_t stride,
+                                      int64_t part_size,
+                                      int64_t end,
+                                      int64_t axis);
+                std::shared_ptr<InputDescription> copy() const override;
+
+                int64_t m_start;
+                int64_t m_stride;
+                int64_t m_part_size;
+                int64_t m_end;
+                int64_t m_axis;
+            };
+
+            /// \brief Describes a body input initialized from a TensorIterator input on the first
+            /// iteration, and then a body output thereafter.
+            class MergedInputDescription : public InputDescription
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"MergedInputDescription", 0};
+                const DiscreteTypeInfo& get_type_info() const override { return type_info; }
+                /// \param input_index Position of the TensorIterator input supplying a value to
+                /// body_parameter
+                /// for the initial iteration.
+                /// \param body_parameter_index Body parameter position to receive input.
+                /// \param body_value_index Body value to supply body_parameter for successive
+                /// iterations.
+                MergedInputDescription(uint64_t input_index,
+                                       uint64_t body_parameter_index,
+                                       uint64_t body_value_index);
+                std::shared_ptr<InputDescription> copy() const override;
+
+                uint64_t m_body_value_index;
+            };
+
+            class InvariantInputDescription : public InputDescription
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"InvariantInputDescription", 0};
+                const DiscreteTypeInfo& get_type_info() const override { return type_info; }
+                InvariantInputDescription(uint64_t input_index, uint64_t body_parameter_index);
+                std::shared_ptr<InputDescription> copy() const override;
+            };
+
+            // Forward declarations
+            class ConcatOutputDescription;
+            class BodyOutputDescription;
+
+            /// \brief Describes how a TensorIterator output is produced from the body.
+            class OutputDescription
+            {
+            protected:
+                /// \param body_value_index A body value that produces the output
+                /// \param output_index The TensorIterator output index
+                OutputDescription(uint64_t body_value_index, uint64_t output_index);
+
+            public:
+                virtual ~OutputDescription() {}
+                virtual std::shared_ptr<OutputDescription> copy() const = 0;
+                virtual const DiscreteTypeInfo& get_type_info() const = 0;
+
+                uint64_t m_body_value_index;
+                uint64_t m_output_index;
+            };
+
+            /// \brief Produces an output by concatenating an output from each iteration
+            class ConcatOutputDescription : public OutputDescription
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"ConcatOutputDescription", 0};
+                const DiscreteTypeInfo& get_type_info() const override { return type_info; }
+                /// \param body_value_index A body value that produces the output
+                /// \param output_index The TensorIterator output index
+                /// \param start First index for slices
+                /// \param stride Step amount for slices
+                /// \param part_size Width of slices
+                /// \param end Last index for slices
+                /// \param axis Axis being sliced
+                ConcatOutputDescription(uint64_t body_value_index,
+                                        uint64_t output_index,
+                                        int64_t start,
+                                        int64_t stride,
+                                        int64_t part_size,
+                                        int64_t end,
+                                        int64_t axis);
+
+                virtual std::shared_ptr<OutputDescription> copy() const override;
+
+                int64_t m_start;
+                int64_t m_stride;
+                int64_t m_part_size;
+                int64_t m_end;
+                int64_t m_axis;
+            };
+
+            /// \brief Produces an output from a specific iteration
+            class BodyOutputDescription : public OutputDescription
+            {
+            public:
+                static constexpr DiscreteTypeInfo type_info{"BodyOutputDescription", 0};
+                const DiscreteTypeInfo& get_type_info() const override { return type_info; }
+                /// \param body_value_index A body value that produces the output
+                /// \param output_index The TensorIterator output index
+                /// \param iteration which iteration (typically -1, final) will supply the value
+                BodyOutputDescription(uint64_t body_value_index,
+                                      uint64_t output_index,
+                                      int64_t iteration);
+                std::shared_ptr<OutputDescription> copy() const override;
+
+                int64_t m_iteration;
+            };
+
+            /// \brief Indicate that a body parameter comes from slices of a value
+            /// \param parameter The parameter to receive the slices
+            /// \param value The value to be sliced. This will be added as an input to
+            /// TensorIterator.
+            /// \param start First index on axis of the slicing
+            /// \param stride Stepping of the slice
+            /// \param part_size Size of the slice on axis
+            /// \param end The last index on axis of the slicing
+            /// \param axis The axis to slice along
+            void set_sliced_input(const std::shared_ptr<Parameter>& parameter,
+                                  const Output<Node>& value,
+                                  int64_t start,
+                                  int64_t stride,
+                                  int64_t part_size,
+                                  int64_t end,
+                                  int64_t axis);
+            /// \brief Indicates that a body parameter has an initial value in the first iteration
+            /// and computed value thereafter
+            /// \param initial_value Value for the parameter in first iteration. This will be added
+            /// as an input to TensorIterator.
+            /// \param successive_value Value for the parameter in successive iterations. The
+            /// value is what is active in the most recent completed iteration.
+            void set_merged_input(const std::shared_ptr<Parameter>& body_parameter,
+                                  const Output<Node>& initial_value,
+                                  const Output<Node>& successive_value);
+            /// \brief Indicates that a body parameter has an invariant value during iteration that
+            /// may depend on values computed outside of the iteration
+            /// \param body_parameter The body parameter
+            /// \param value The value supplied as an input to the block
+            void set_invariant_input(const std::shared_ptr<Parameter>& body_parameter,
+                                     const Output<Node>& value);
+            /// \brief Gets a value for a particular iteration point
+            /// \param body_value The value
+            /// \param iteration The iteration that supplies the value. Negative values are from the
+            /// last iteration.
+            Output<Node> get_iter_value(const Output<Node>& body_value, int64_t iteration);
+            /// \brief Concatenates slices from all iterations
+            /// \param value The value supplying slice values from each iteration.
+            /// \param start First index on axis of the slicing
+            /// \param stride Stepping of the slice
+            /// \param part_size Size of the slice on axis
+            /// \param end The last index on axis of the slicing
+            /// \param axis The axis to slice along
+            Output<Node> get_concatenated_slices(const Output<Node>& value,
+                                                 int64_t start,
+                                                 int64_t stride,
+                                                 int64_t part_size,
+                                                 int64_t end,
+                                                 int64_t axis);
+
+            std::shared_ptr<Node> copy_with_new_args(const NodeVector& new_args) const override;
+            NodeVector decompose_op() const override;
+            /// \return the body of the iteration
+            std::shared_ptr<BodyLambda> get_body() const { return m_body; }
+            /// \param body set the body of the iteration
+            void set_body(const std::shared_ptr<BodyLambda>& body) { m_body = body; }
+            /// \return a reference to the input descriptions.
+            const std::vector<std::shared_ptr<InputDescription>>& get_input_descriptions() const
+            {
+                return m_input_descriptions;
+            }
+            /// \return a reference to the input descriptions. Can add input descriptions before
+            /// validation.
+            std::vector<std::shared_ptr<InputDescription>>& get_input_descriptions()
+            {
+                return m_input_descriptions;
+            }
+
+            /// \return a reference to the output descriptions.
+            const std::vector<std::shared_ptr<OutputDescription>>& get_output_descriptions() const
+            {
+                return m_output_descriptions;
+            }
+
+            /// \return a reference to the output descriptions. Can add output descriptions before
+            /// validation.
+            std::vector<std::shared_ptr<OutputDescription>>& get_output_descriptions()
+            {
+                return m_output_descriptions;
+            }
+
+            virtual void validate_and_infer_types() override;
+            void revalidate_and_infer_types_for_body_ops();
+
+            int64_t get_num_iterations() const { return m_num_iterations; }
+        private:
+            // Find an input corresponding to value, adding one if necessary.
+            Input<Node> input_for_value(const Output<Node>& value);
+
+            std::shared_ptr<BodyLambda> m_body;
+            std::vector<std::shared_ptr<InputDescription>> m_input_descriptions;
+            std::vector<std::shared_ptr<OutputDescription>> m_output_descriptions;
+
+            int64_t m_num_iterations = -1;
+        };
+    }
+}
--- a/src/ngraph/partial_shape.hpp
+++ b/src/ngraph/partial_shape.hpp
@@ -181,6 +181,8 @@ namespace ngraph
        /// \param i The index of the dimension being selected.
        /// \return A reference to the `i`th Dimension of this shape.
        Dimension& operator[](size_t i) { return m_dimensions[i]; }
+        /// \brief Returns a vector of the dimensions. This has no meaning if dynamic.
+        explicit operator std::vector<Dimension>() const { return m_dimensions; }
        friend std::ostream& operator<<(std::ostream& str, const PartialShape& shape);
        friend PartialShape operator+(const PartialShape& s1, const PartialShape& s2);


--- a/src/ngraph/serializer.cpp
+++ b/src/ngraph/serializer.cpp
@@ -153,6 +153,7 @@
 #include "ngraph/op/sum.hpp"
 #include "ngraph/op/tan.hpp"
 #include "ngraph/op/tanh.hpp"
+#include "ngraph/op/tensor_iterator.hpp"
 #include "ngraph/op/topk.hpp"
 #include "ngraph/op/util/attr_types.hpp"
 #include "ngraph/op/xor.hpp"
@@ -245,6 +246,10 @@ public:
    json serialize_node_reference(const Node& node);
    json serialize_node(const Node& node);
    json serialize_axis_set(const AxisSet& axis_set);
+    json serialize_tensor_iterator_input_description(
+        const std::shared_ptr<op::TensorIterator::InputDescription>&);
+    json serialize_tensor_iterator_output_description(
+        const std::shared_ptr<op::TensorIterator::OutputDescription>&);

 protected:
    size_t m_indent{0};
@@ -270,6 +275,10 @@ public:
    shared_ptr<Node> deserialize_node_reference(json j);
    shared_ptr<Node> deserialize_node(json j);
    AxisSet deserialize_axis_set(json j);
+    shared_ptr<op::TensorIterator::InputDescription>
+        deserialize_tensor_iterator_input_description(json j);
+    shared_ptr<op::TensorIterator::OutputDescription>
+        deserialize_tensor_iterator_output_description(json j);

 protected:
    unordered_map<string, shared_ptr<Node>> m_node_map;
@@ -644,6 +653,144 @@ AxisSet JSONDeserializer::deserialize_axis_set(json j)
    return result;
 }

+json JSONSerializer::serialize_tensor_iterator_input_description(
+    const std::shared_ptr<op::TensorIterator::InputDescription>& input_description)
+{
+    json result;
+    if (auto slice = as_type_ptr<op::TensorIterator::SliceInputDescription>(input_description))
+    {
+        result["kind"] = "slice";
+        result["input_index"] = slice->m_input_index;
+        result["body_parameter_index"] = slice->m_body_parameter_index;
+        result["start"] = slice->m_start;
+        result["stride"] = slice->m_stride;
+        result["part_size"] = slice->m_part_size;
+        result["end"] = slice->m_end;
+        result["axis"] = slice->m_axis;
+    }
+    else if (auto merged =
+                 as_type_ptr<op::TensorIterator::MergedInputDescription>(input_description))
+    {
+        result["kind"] = "merged";
+        result["input_index"] = merged->m_input_index;
+        result["body_parameter_index"] = merged->m_body_parameter_index;
+        result["body_value_index"] = merged->m_body_value_index;
+    }
+    else if (auto constant =
+                 as_type_ptr<op::TensorIterator::InvariantInputDescription>(input_description))
+    {
+        result["kind"] = "constant";
+        result["input_index"] = constant->m_input_index;
+        result["body_parameter_index"] = constant->m_body_parameter_index;
+    }
+    else
+    {
+        NGRAPH_UNREACHABLE("Unknown input description type");
+    }
+    return result;
+}
+
+shared_ptr<op::TensorIterator::InputDescription>
+    JSONDeserializer::deserialize_tensor_iterator_input_description(json j)
+{
+    string kind = j["kind"];
+    shared_ptr<op::TensorIterator::InputDescription> result;
+    if (kind == "slice")
+    {
+        uint64_t input_index = j["input_index"].get<uint64_t>();
+        uint64_t body_parameter_index = j["body_parameter_index"].get<uint64_t>();
+        int64_t start = j["start"].get<int64_t>();
+        int64_t stride = j["stride"].get<int64_t>();
+        uint64_t part_size = j["part_size"].get<int64_t>();
+        int64_t end = j["end"].get<int64_t>();
+        int64_t axis = j["axis"].get<int64_t>();
+        result = make_shared<op::TensorIterator::SliceInputDescription>(
+            input_index, body_parameter_index, start, stride, part_size, end, axis);
+    }
+    else if (kind == "merged")
+    {
+        uint64_t input_index = j["input_index"].get<uint64_t>();
+        uint64_t body_parameter_index = j["body_parameter_index"].get<uint64_t>();
+        uint64_t body_value_index = j["body_value_index"].get<uint64_t>();
+        result = make_shared<op::TensorIterator::MergedInputDescription>(
+            input_index, body_parameter_index, body_value_index);
+    }
+    else if (kind == "constant")
+    {
+        uint64_t input_index = j["input_index"].get<uint64_t>();
+        uint64_t body_parameter_index = j["body_parameter_index"].get<uint64_t>();
+        result = make_shared<op::TensorIterator::InvariantInputDescription>(input_index,
+                                                                            body_parameter_index);
+    }
+    else
+    {
+        NGRAPH_UNREACHABLE("Unknown input description type: ", kind);
+    }
+    return result;
+}
+
+json JSONSerializer::serialize_tensor_iterator_output_description(
+    const std::shared_ptr<op::TensorIterator::OutputDescription>& output_description)
+{
+    json result;
+    if (auto concat = as_type_ptr<op::TensorIterator::ConcatOutputDescription>(output_description))
+    {
+        result["kind"] = "concat";
+        result["body_value_index"] = concat->m_body_value_index;
+        result["output_index"] = concat->m_output_index;
+        result["start"] = concat->m_start;
+        result["stride"] = concat->m_stride;
+        result["part_size"] = concat->m_part_size;
+        result["end"] = concat->m_end;
+        result["axis"] = concat->m_axis;
+    }
+    else if (auto body_output =
+                 as_type_ptr<op::TensorIterator::BodyOutputDescription>(output_description))
+    {
+        result["kind"] = "body_output";
+        result["body_value_index"] = body_output->m_body_value_index;
+        result["output_index"] = body_output->m_output_index;
+        result["iteration"] = body_output->m_iteration;
+    }
+    else
+    {
+        NGRAPH_UNREACHABLE("Unknown input description type");
+    }
+    return result;
+}
+
+std::shared_ptr<op::TensorIterator::OutputDescription>
+    JSONDeserializer::deserialize_tensor_iterator_output_description(json j)
+{
+    string kind = j["kind"];
+    shared_ptr<op::TensorIterator::OutputDescription> result;
+    if (kind == "concat")
+    {
+        uint64_t body_value_index = j["body_value_index"].get<uint64_t>();
+        uint64_t output_index = j["output_index"].get<uint64_t>();
+        int64_t start = j["start"].get<int64_t>();
+        int64_t stride = j["stride"].get<int64_t>();
+        uint64_t part_size = j["part_size"].get<int64_t>();
+        int64_t end = j["end"].get<int64_t>();
+        int64_t axis = j["axis"].get<int64_t>();
+        result = make_shared<op::TensorIterator::ConcatOutputDescription>(
+            body_value_index, output_index, start, stride, part_size, end, axis);
+    }
+    else if (kind == "body_output")
+    {
+        uint64_t body_value_index = j["body_value_index"].get<uint64_t>();
+        uint64_t output_index = j["output_index"].get<uint64_t>();
+        int64_t iteration = j["iteration"].get<int64_t>();
+        result = make_shared<op::TensorIterator::BodyOutputDescription>(
+            body_value_index, output_index, iteration);
+    }
+    else
+    {
+        NGRAPH_UNREACHABLE("Unknown input description type: ", kind);
+    }
+    return result;
+}
+
 ParameterVector JSONDeserializer::deserialize_parameter_vector(json json_parameters)
 {
    std::vector<std::shared_ptr<op::Parameter>> params;
@@ -2446,6 +2593,50 @@ shared_ptr<Node> JSONDeserializer::deserialize_node(json node_js)
            node = make_shared<op::Tanh>(args[0]);
            break;
        }
+        case OP_TYPEID::TensorIterator:
+        {
+            auto ti = make_shared<op::TensorIterator>(args);
+            json jbody = node_js["body"];
+            // Serializer assumes inputs are available before users sp we
+            // need to make sure the body nodes are all deserialized before
+            // referencing them.
+            json jbody_nodes = jbody["nodes"];
+            NodeVector body_nodes;
+            for (json jnode : jbody_nodes)
+            {
+                body_nodes.push_back(deserialize_node(jnode));
+            }
+            json jparams = jbody["parameters"];
+            ParameterVector parameters;
+            for (json jparam : jparams)
+            {
+                parameters.push_back(as_type_ptr<op::Parameter>(deserialize_node(jparam)));
+            }
+            json jresults = jbody["results"];
+            ResultVector results;
+            for (json jresult : jresults)
+            {
+                results.push_back(as_type_ptr<op::Result>(deserialize_node(jresult)));
+            }
+            ti->set_body(make_shared<op::TensorIterator::BodyLambda>(results, parameters));
+            json jins = node_js["input_descriptions"];
+            for (json jin : jins)
+            {
+                ti->get_input_descriptions().push_back(
+                    deserialize_tensor_iterator_input_description(jin));
+            }
+            json jouts = node_js["output_descriptions"];
+            for (json jout : jouts)
+            {
+                ti->get_output_descriptions().push_back(
+                    deserialize_tensor_iterator_output_description(jout));
+            }
+            ti->set_output_size(ti->get_output_descriptions().size());
+
+            node = ti;
+            break;
+        }
+
        case OP_TYPEID::Tile:
        {
            node = make_shared<op::Tile>(args[0], args[1]);
@@ -3871,6 +4062,48 @@ json JSONSerializer::serialize_node(const Node& n)
    }
    case OP_TYPEID::Tanh: { break;
    }
+    case OP_TYPEID::TensorIterator:
+    {
+        auto tmp = static_cast<const op::TensorIterator*>(&n);
+        json body = json::object();
+        {
+            auto& body_results = tmp->get_body()->get_results();
+            // Serializer assumes node inputs are already serialized, so
+            // we need to capture the body-referenced nodes here.
+            json body_nodes = json::array();
+            for (auto n : topological_sort(body_results))
+            {
+                body_nodes.push_back(serialize_node(*n));
+            }
+            body["nodes"] = body_nodes;
+            json params = json::array();
+            for (auto param : tmp->get_body()->get_parameters())
+            {
+                params.push_back(serialize_node(*param));
+            }
+            body["parameters"] = params;
+            json results = json::array();
+            for (auto result : body_results)
+            {
+                results.push_back(serialize_node(*result));
+            }
+            body["results"] = results;
+        }
+        node["body"] = body;
+        json ins = json::array();
+        for (auto in : tmp->get_input_descriptions())
+        {
+            ins.push_back(serialize_tensor_iterator_input_description(in));
+        }
+        node["input_descriptions"] = ins;
+        json outs = json::array();
+        for (auto out : tmp->get_output_descriptions())
+        {
+            outs.push_back(serialize_tensor_iterator_output_description(out));
+        }
+        node["output_descriptions"] = outs;
+        break;
+    }
    case OP_TYPEID::Tile: { break;
    }
    case OP_TYPEID::TopK:

--- a/test/serialize.cpp
+++ b/test/serialize.cpp
@@ -441,6 +441,225 @@ TEST(serialize, opset1_pad)
    EXPECT_EQ(dynamic_cast<const op::v1::Pad*>(g_pad.get())->get_pad_mode(), pad_mode);
 }

+TEST(serialize, tensor_iterator_raw)
+{
+    // That which we iterate over
+    auto X = make_shared<op::Parameter>(element::f32, Shape{32, 40, 10});
+
+    // Common to all cells
+    auto WH = make_shared<op::Parameter>(element::f32, Shape{20, 20});
+    auto WX = make_shared<op::Parameter>(element::f32, Shape{10, 20});
+    auto bH = make_shared<op::Parameter>(element::f32, Shape{20});
+    auto WY = make_shared<op::Parameter>(element::f32, Shape{20, 5});
+    auto bY = make_shared<op::Parameter>(element::f32, Shape{5});
+
+    // Initial values
+    auto Hinit = make_shared<op::Parameter>(element::f32, Shape{32, 1, 20});
+
+    // Set up the cell body, a function from (Hi, Xi) -> (Ho, Yo)
+    // Cell parameters
+    auto Hi = make_shared<op::Parameter>(element::f32, Shape{32, 1, 20});
+    auto Xi = make_shared<op::Parameter>(element::f32, Shape{32, 1, 10});
+    auto WH_body = make_shared<op::Parameter>(element::f32, Shape{20, 20});
+    auto WX_body = make_shared<op::Parameter>(element::f32, Shape{10, 20});
+    auto bH_body = make_shared<op::Parameter>(element::f32, Shape{20});
+    auto WY_body = make_shared<op::Parameter>(element::f32, Shape{20, 5});
+    auto bY_body = make_shared<op::Parameter>(element::f32, Shape{5});
+
+    // Body
+    auto Ho = make_shared<op::Reshape>(
+        make_shared<op::Relu>(
+            make_shared<op::Dot>(make_shared<op::Reshape>(Xi, AxisVector{0, 1, 2}, Shape{32, 10}),
+                                 WX_body) +
+            make_shared<op::Dot>(make_shared<op::Reshape>(Hi, AxisVector{0, 1, 2}, Shape{32, 20}),
+                                 WH_body) +
+            make_shared<op::Broadcast>(bH_body, Shape{32, 20}, AxisSet{0})),
+        AxisVector{0, 1},
+        Shape{32, 1, 20});
+    auto Yo = make_shared<op::Relu>(
+        make_shared<op::Dot>(make_shared<op::Reshape>(Ho, AxisVector{0, 1, 2}, Shape{32, 20}),
+                             WY_body) +
+        make_shared<op::Broadcast>(bY_body, Shape{32, 5}, AxisSet{0}));
+    auto body = make_shared<op::TensorIterator::BodyLambda>(
+        OutputVector{Yo, Ho}, ParameterVector{Xi, Hi, WH_body, WX_body, WY_body, bH_body, bY_body});
+
+    auto tensor_iterator = make_shared<op::TensorIterator>();
+    tensor_iterator->set_body(body);
+    // The Xi are the elements of Xseq
+    // start=0, stride=1, part_size=1, end=40, axis=1
+    tensor_iterator->set_sliced_input(Xi, X, 0, 1, 1, 40, 1);
+    // Hi is Hinit on the first iteration, Ho after that
+    tensor_iterator->set_merged_input(Hi, Hinit, Ho);
+    tensor_iterator->set_invariant_input(WH_body, WH);
+    tensor_iterator->set_invariant_input(WX_body, WX);
+    tensor_iterator->set_invariant_input(WY_body, WY);
+    tensor_iterator->set_invariant_input(bH_body, bH);
+    tensor_iterator->set_invariant_input(bY_body, bY);
+
+    // Output 0 is last Yo
+    auto out0 = tensor_iterator->get_iter_value(Yo, -1);
+    // Output 1 is concat of hidden states
+    // start=0, stride=1, part_size=1, end=40, axis=1
+    auto out1 = tensor_iterator->get_concatenated_slices(Ho, 0, 1, 1, 40, 1);
+
+    auto results = ResultVector{make_shared<op::Result>(out0), make_shared<op::Result>(out1)};
+    auto f = make_shared<Function>(results, ParameterVector{X, Hinit, WH, WX, bH, WY, bY});
+    string s = serialize(f);
+    shared_ptr<Function> g = deserialize(s);
+}
+
+TEST(serialize, tensor_iterator_lstm)
+{
+    // That which we iterate over
+    const size_t N = 32; // Batch size
+    const size_t L = 10; // Sequence length
+    const size_t I = 8;  // Input size
+    const size_t H = 32; // Hidden size
+    auto SENT = make_shared<op::Parameter>(element::f32, Shape{N, L, I});
+
+    auto H_init = make_shared<op::Parameter>(element::f32, Shape{N, 1, H});
+    auto C_init = make_shared<op::Parameter>(element::f32, Shape{N, 1, H});
+
+    auto W = make_shared<op::Parameter>(element::f32, Shape{4 * H, I});
+    auto R = make_shared<op::Parameter>(element::f32, Shape{4 * H, H});
+    auto H_t = make_shared<op::Parameter>(element::f32, Shape{N, 1, H});
+    auto C_t = make_shared<op::Parameter>(element::f32, Shape{N, 1, H});
+
+    // Body
+    auto X = make_shared<op::Parameter>(element::f32, Shape{N, 1, I});
+    auto W_body = make_shared<op::Parameter>(element::f32, Shape{4 * H, I});
+    auto R_body = make_shared<op::Parameter>(element::f32, Shape{4 * H, H});
+    auto LSTM_cell =
+        make_shared<op::LSTMCell>(make_shared<op::Reshape>(X, AxisVector{0, 1, 2}, Shape{N, I}),
+                                  W_body,
+                                  R_body,
+                                  make_shared<op::Reshape>(H_t, AxisVector{0, 1, 2}, Shape{N, H}),
+                                  make_shared<op::Reshape>(C_t, AxisVector{0, 1, 2}, Shape{N, H}),
+                                  H);
+    auto H_o = make_shared<op::Reshape>(LSTM_cell->output(0), AxisVector{0, 1}, Shape{N, 1, H});
+    auto C_o = make_shared<op::Reshape>(LSTM_cell->output(1), AxisVector{0, 1}, Shape{N, 1, H});
+    auto body = make_shared<op::TensorIterator::BodyLambda>(
+        OutputVector{H_o, C_o}, ParameterVector{X, H_t, C_t, W_body, R_body});
+
+    auto tensor_iterator = make_shared<op::TensorIterator>();
+    tensor_iterator->set_body(body);
+    // start=0, stride=1, part_size=1, end=40, axis=1
+    tensor_iterator->set_sliced_input(X, SENT, 0, 1, 1, -1, 1);
+    // H_t is Hinit on the first iteration, Ho after that
+    tensor_iterator->set_merged_input(H_t, H_init, H_o);
+    tensor_iterator->set_merged_input(C_t, C_init, C_o);
+    tensor_iterator->set_invariant_input(W_body, W);
+    tensor_iterator->set_invariant_input(R_body, R);
+
+    // Output 0 is last Ho, result 0 of body
+    auto out0 = tensor_iterator->get_iter_value(H_o, -1);
+    // Output 1 is last Co, result 1 of body
+    auto out1 = tensor_iterator->get_iter_value(C_o, -1);
+
+    auto results = ResultVector{make_shared<op::Result>(out0), make_shared<op::Result>(out1)};
+    auto f = make_shared<Function>(results, ParameterVector{SENT, H_init, C_init, W, R});
+    string s = serialize(f);
+    shared_ptr<Function> g = deserialize(s);
+}
+
+TEST(serialize, tensor_iterator_2_slice_inputs_part_size_2)
+{
+    // That which we iterate over
+    auto X = make_shared<op::Parameter>(element::f32, Shape{32, 40, 10});
+    auto Y = make_shared<op::Parameter>(element::f32, Shape{32, 40, 10});
+    auto M = make_shared<op::Parameter>(element::f32, Shape{32, 2, 10});
+
+    // Set up the cell body, a function from (Xi, Yi) -> (Zo)
+    // Body parameters
+    auto Xi = make_shared<op::Parameter>(element::f32, Shape{32, 2, 10});
+    auto Yi = make_shared<op::Parameter>(element::f32, Shape{32, 2, 10});
+    auto M_body = make_shared<op::Parameter>(element::f32, Shape{32, 2, 10});
+
+    // Body
+    auto Zo = (Xi + Yi) * M_body;
+    auto body = make_shared<op::TensorIterator::BodyLambda>(OutputVector{Zo},
+                                                            ParameterVector{Xi, Yi, M_body});
+
+    auto tensor_iterator = make_shared<op::TensorIterator>();
+    tensor_iterator->set_body(body);
+    // The Xi are the elements of Xseq
+    // start=0, stride=2, part_size=2, end=20, axis=1
+    tensor_iterator->set_sliced_input(Xi, X, 0, 2, 2, 20, 1);
+    // The Yi are the elements of Yseq
+    // start=0, stride=2, part_size=2, end=-1, axis=1
+    tensor_iterator->set_sliced_input(Yi, Y, 0, 2, 2, -1, 1);
+    tensor_iterator->set_invariant_input(M_body, M);
+
+    // Output 0 is last Zo
+    auto out0 = tensor_iterator->get_iter_value(Zo, -1);
+    // Output 1 is concat of Zos
+    // start=0, stride=2, part_size=2, end=20, axis=1
+    auto out1 = tensor_iterator->get_concatenated_slices(Zo, 0, 2, 2, 20, 1);
+
+    auto result0 = make_shared<op::Result>(out0);
+    auto result1 = make_shared<op::Result>(out1);
+    Shape out0_shape{32, 2, 10};
+    Shape out1_shape{32, 40, 10};
+
+    auto results = ResultVector{result0, result1};
+    auto f = make_shared<Function>(results, ParameterVector{X, Y, M});
+    EXPECT_EQ(result0->output(0).get_shape(), out0_shape);
+    EXPECT_EQ(result1->output(0).get_shape(), out1_shape);
+
+    string s = serialize(f);
+    shared_ptr<Function> g = deserialize(s);
+}
+
+TEST(serialize, tensor_iterator_2_slice_inputs_part_size_2_dynamic)
+{
+    // That which we iterate over
+    auto X = make_shared<op::Parameter>(element::f32, Shape{32, 40, 10});
+    auto Y = make_shared<op::Parameter>(element::f32, Shape{32, 40, 10});
+    auto M = make_shared<op::Parameter>(element::f32, Shape{32, 2, 10});
+
+    // Set up the cell body, a function from (Xi, Yi) -> (Zo)
+    // Body parameters
+    auto Xi = make_shared<op::Parameter>(element::f32, PartialShape::dynamic());
+    auto Yi = make_shared<op::Parameter>(element::f32, PartialShape::dynamic());
+    auto M_body = make_shared<op::Parameter>(element::f32, PartialShape::dynamic());
+
+    // Body
+    auto Zo = (Xi + Yi) * M_body;
+    auto body = make_shared<op::TensorIterator::BodyLambda>(OutputVector{Zo},
+                                                            ParameterVector{Xi, Yi, M_body});
+
+    auto tensor_iterator = make_shared<op::TensorIterator>();
+    tensor_iterator->set_body(body);
+    // The Xi are the elements of Xseq
+    // start=0, stride=2, part_size=2, end=20, axis=1
+    tensor_iterator->set_sliced_input(Xi, X, 0, 2, 2, 20, 1);
+    // The Yi are the elements of Yseq
+    // start=0, stride=2, part_size=2, end=-1, axis=1
+    tensor_iterator->set_sliced_input(Yi, Y, 0, 2, 2, -1, 1);
+    tensor_iterator->set_invariant_input(M_body, M);
+
+    // Output 0 is last Zo
+    auto out0 = tensor_iterator->get_iter_value(Zo, -1);
+    // Output 1 is concat of Zos
+    // start=0, stride=2, part_size=2, end=20, axis=1
+    auto out1 = tensor_iterator->get_concatenated_slices(Zo, 0, 2, 2, 20, 1);
+
+    auto result0 = make_shared<op::Result>(out0);
+    auto result1 = make_shared<op::Result>(out1);
+    Shape out0_shape{32, 2, 10};
+    Shape out1_shape{32, 40, 10};
+
+    auto results = ResultVector{result0, result1};
+    auto f = make_shared<Function>(results, ParameterVector{X, Y, M});
+    EXPECT_EQ(result0->output(0).get_shape(), out0_shape);
+    EXPECT_EQ(result1->output(0).get_shape(), out1_shape);
+
+    EXPECT_EQ(body->get_results()[0]->output(0).get_shape(), out0_shape);
+
+    string s = serialize(f);
+    shared_ptr<Function> g = deserialize(s);
+}
+
 TEST(serialize, opset1_strided_slice)
 {
    auto data = make_shared<op::Parameter>(element::f32, Shape{2, 4, 6, 8});