KinectFusion big update: OpenCL support, etc. (#1798)

* KinFu demo: idle mode added * undistort added * KinFu demo: depth recorder added * TSDFVolume gets voxelSize, voxelSizeInv, truncDist members; decorative fixes * TSDFVolumeGPU::integrate(): host code compiles * TSDFVolume: truncDist fixed * TSDFVolume::integrate(): initial OCL version is done * TSDFVolume::integrate(): OCL: minor fixes * kinfu: small fixes * TSDFVolume::raycast(): initial GPU version is done * USE_INTRINSICS directive for centralized enable/disable opt. code * TSDF Volume supports 3 different sizes/resolutions on each dimension * TSDFVolume: serviceMembers moved to parent class * TSDFVolumeGPU: mem order changed, gave perf boost 4x * Frame: fixed UMat as InputArray; TSDF: comments, TODOs, minor fixes * Frame::getPointsNormals(); minors * FrameGPU: initial version (not working) * minor * FrameGPU: several fixes * added OCL "fast" options * ICP OCL: initial slow version is done (host-side reduce) * ICP OCL: reduce is done, buggy * KinFu Frame: more args fixes * ICP OCL: small fixes to kernel * ICP OCL reduce works * OCL code refactored * ICP OCL: less allocations, better speed * ICP OCL: pose matrix made arg * Render OCL: small fix * Demo: using UMats everywhere * TSDF integrate OCL: put const arg into kernel arg * Platform parameter partially removed, implementation choice is done through OCL availability check * Frame and FrameGenerator removed (other code is in following commits) * CPU render: 4b instead of 3b * ICP: no Frame class use, one class for both CPU and GPU code * demo: fix for UMat chain * TSDF: no Frame or FrameGenerator use * KinFu: no Frame or FrameGenerator, parametrized for Mat or UMat based on OCL availability * KinFu::setParams() removed since it has no effect anyway * TSDF::raycast OCL: fixed normals rendering * ScopeTime -> CV_TRACE * 3-dims resolution and size passed to API * fixed unexpected fails of ICP OCL * voxels made cubic again * args fixed a little * fixed volSize calculation * Tests: inequal, OCL, unified scene test function * kinfu_frame: input types fixed * fixed for Intel HD Graphics * KinFu demo: setUseOptimized instead of setUseOpenCL * tsdf: data types fixed * TSDF OCL: fetch normals implemented * roundDownPow2 -> utils.hpp * TSDF OCL: fetch points+normals implemented * TSDF OCL: NoSize, other fixes for kernel args * Frame OCL: HAVE_OPENCL, NoSize, other kernel args fixed * ICP OCL: HAVE_OPENCL, NoSize and other kernel fixes * KinFu demo fixes: volume size and too long delay * whitespace fix * TSDF: allowed sizes not divisable by 32 * TSDF: fixed type traits; added optimization TODOs * KinFu made non-free * minor fixes: cast and whitespace * fixed FastICP test * warnings fixed: implicit type conversions * OCL kernels: local args made through KernelArg::Local(lsz) call * MSVC warnings fixed * a workaround for broken OCV's bilateral * KinFu tests made a bit faster * build fixed until 3.4 isn't merged to master * warnings fixed, test time shortened

KinectFusion big update: OpenCL support, etc. (#1798)
* KinFu demo: idle mode added * undistort added * KinFu demo: depth recorder added * TSDFVolume gets voxelSize, voxelSizeInv, truncDist members; decorative fixes * TSDFVolumeGPU::integrate(): host code compiles * TSDFVolume: truncDist fixed * TSDFVolume::integrate(): initial OCL version is done * TSDFVolume::integrate(): OCL: minor fixes * kinfu: small fixes * TSDFVolume::raycast(): initial GPU version is done * USE_INTRINSICS directive for centralized enable/disable opt. code * TSDF Volume supports 3 different sizes/resolutions on each dimension * TSDFVolume: serviceMembers moved to parent class * TSDFVolumeGPU: mem order changed, gave perf boost 4x * Frame: fixed UMat as InputArray; TSDF: comments, TODOs, minor fixes * Frame::getPointsNormals(); minors * FrameGPU: initial version (not working) * minor * FrameGPU: several fixes * added OCL "fast" options * ICP OCL: initial slow version is done (host-side reduce) * ICP OCL: reduce is done, buggy * KinFu Frame: more args fixes * ICP OCL: small fixes to kernel * ICP OCL reduce works * OCL code refactored * ICP OCL: less allocations, better speed * ICP OCL: pose matrix made arg * Render OCL: small fix * Demo: using UMats everywhere * TSDF integrate OCL: put const arg into kernel arg * Platform parameter partially removed, implementation choice is done through OCL availability check * Frame and FrameGenerator removed (other code is in following commits) * CPU render: 4b instead of 3b * ICP: no Frame class use, one class for both CPU and GPU code * demo: fix for UMat chain * TSDF: no Frame or FrameGenerator use * KinFu: no Frame or FrameGenerator, parametrized for Mat or UMat based on OCL availability * KinFu::setParams() removed since it has no effect anyway * TSDF::raycast OCL: fixed normals rendering * ScopeTime -> CV_TRACE * 3-dims resolution and size passed to API * fixed unexpected fails of ICP OCL * voxels made cubic again * args fixed a little * fixed volSize calculation * Tests: inequal, OCL, unified scene test function * kinfu_frame: input types fixed * fixed for Intel HD Graphics * KinFu demo: setUseOptimized instead of setUseOpenCL * tsdf: data types fixed * TSDF OCL: fetch normals implemented * roundDownPow2 -> utils.hpp * TSDF OCL: fetch points+normals implemented * TSDF OCL: NoSize, other fixes for kernel args * Frame OCL: HAVE_OPENCL, NoSize, other kernel args fixed * ICP OCL: HAVE_OPENCL, NoSize and other kernel fixes * KinFu demo fixes: volume size and too long delay * whitespace fix * TSDF: allowed sizes not divisable by 32 * TSDF: fixed type traits; added optimization TODOs * KinFu made non-free * minor fixes: cast and whitespace * fixed FastICP test * warnings fixed: implicit type conversions * OCL kernels: local args made through KernelArg::Local(lsz) call * MSVC warnings fixed * a workaround for broken OCV's bilateral * KinFu tests made a bit faster * build fixed until 3.4 isn't merged to master * warnings fixed, test time shortened
75bcd397 · Rostislav Vasilikhin · Vadim Pisarevsky · 2f014f6f · 75bcd397 · 75bcd397
Commit 75bcd397 authored Oct 09, 2018 by Rostislav Vasilikhin Committed by Vadim Pisarevsky Oct 09, 2018
20 changed files
--- a/modules/rgbd/README.md
+++ b/modules/rgbd/README.md
@@ -15,3 +15,5 @@ Note that the KinectFusion algorithm was patented and its use may be restricted
 * _US8401225B2_  Moving object segmentation using depth images
 Since OpenCV's license imposes different restrictions on usage please consult a legal before using this algorithm any way.
+That's why you need to set the OPENCV_ENABLE_NONFREE option in CMake to use KinectFusion.
--- a/modules/rgbd/include/opencv2/rgbd/kinfu.hpp
+++ b/modules/rgbd/include/opencv2/rgbd/kinfu.hpp
@@ -28,19 +28,6 @@ struct CV_EXPORTS_W Params
    */
    CV_WRAP static Ptr<Params> coarseParams();
-    enum PlatformType
-    {
-        PLATFORM_CPU, PLATFORM_GPU
-    };
-    /** @brief A platform on which to run the algorithm.
-     *
-    Currently KinFu supports only one platform which is a CPU.
-    GPU platform is to be implemented in the future.
-    */
-    PlatformType platform;
    /** @brief frame size in pixels */
    CV_PROP_RW Size frameSize;
@@ -51,6 +38,7 @@ struct CV_EXPORTS_W Params
    Typical values are:
         * 5000 per 1 meter for the 16-bit PNG files of TUM database
+         * 1000 per 1 meter for Kinect 2 device
         * 1 per 1 meter for the 32-bit float images in the ROS bag files
    */
    CV_PROP_RW float depthFactor;
@@ -65,13 +53,13 @@ struct CV_EXPORTS_W Params
    /** @brief Number of pyramid levels for ICP */
    CV_PROP_RW int pyramidLevels;
-    /** @brief Resolution of voxel cube
+    /** @brief Resolution of voxel space
-    Number of voxels in each cube edge.
+    Number of voxels in each dimension.
    */
-    CV_PROP_RW int volumeDims;
+    CV_PROP_RW Vec3i volumeDims;
-    /** @brief Size of voxel cube side in meters */
+    /** @brief Size of voxel in meters */
-    CV_PROP_RW float volumeSize;
+    CV_PROP_RW float voxelSize;
    /** @brief Minimal camera movement in meters
@@ -84,7 +72,7 @@ struct CV_EXPORTS_W Params
    /** @brief distance to truncate in meters
-    Distances that exceed this value will be truncated in voxel cube values.
+    Distances to surface that exceed this value will be truncated to 1.0.
    */
    CV_PROP_RW float tsdf_trunc_dist;
@@ -128,11 +116,19 @@ struct CV_EXPORTS_W Params
  The output can be obtained as a vector of points and their normals
  or can be Phong-rendered from given camera pose.
-  An internal representation of a model is a voxel cube that keeps TSDF values
+  An internal representation of a model is a voxel cuboid that keeps TSDF values
  which are a sort of distances to the surface (for details read the @cite kinectfusion article about TSDF).
  There is no interface to that representation yet.
-  This implementation is based on (kinfu-remake)[https://github.com/Nerei/kinfu_remake].
+  KinFu uses OpenCL acceleration automatically if available.
+  To enable or disable it explicitly use cv::setUseOptimized() or cv::ocl::setUseOpenCL().
+  This implementation is based on [kinfu-remake](https://github.com/Nerei/kinfu_remake).
+  Note that the KinectFusion algorithm was patented and its use may be restricted by
+  the list of patents mentioned in README.md file in this module directory.
+  That's why you need to set the OPENCV_ENABLE_NONFREE option in CMake to use KinectFusion.
 */
 class CV_EXPORTS_W KinFu
 {
@@ -142,11 +138,10 @@ public:
    /** @brief Get current parameters */
    virtual const Params& getParams() const = 0;
-    virtual void setParams(const Params&) = 0;
    /** @brief Renders a volume into an image
-      Renders a 0-surface of TSDF using Phong shading into a CV_8UC3 Mat.
+      Renders a 0-surface of TSDF using Phong shading into a CV_8UC4 Mat.
      Light pose is fixed in KinFu params.
        @param image resulting image
@@ -186,21 +181,18 @@ public:
    */
    CV_WRAP virtual void reset() = 0;
-    /** @brief Get current pose in voxel cube space */
+    /** @brief Get current pose in voxel space */
    virtual const Affine3f getPose() const = 0;
    /** @brief Process next depth frame
-      Integrates depth into voxel cube with respect to its ICP-calculated pose.
+      Integrates depth into voxel space with respect to its ICP-calculated pose.
      Input image is converted to CV_32F internally if has another type.
    @param depth one-channel image which size and depth scale is described in algorithm's parameters
    @return true if succeded to align new frame with current scene, false if opposite
    */
    CV_WRAP virtual bool update(InputArray depth) = 0;
-private:
-    class KinFuImpl;
 };
 //! @}

--- a/modules/rgbd/samples/kinfu_demo.cpp
+++ b/modules/rgbd/samples/kinfu_demo.cpp
--- a/modules/rgbd/src/fast_icp.cpp
+++ b/modules/rgbd/src/fast_icp.cpp
--- a/modules/rgbd/src/fast_icp.hpp
+++ b/modules/rgbd/src/fast_icp.hpp
@@ -18,7 +18,10 @@ class ICP
 public:
    ICP(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations, float _angleThreshold, float _distanceThreshold);
-    virtual bool estimateTransform(cv::Affine3f& transform, cv::Ptr<Frame> oldFrame, cv::Ptr<Frame> newFrame) const = 0;
+    virtual bool estimateTransform(cv::Affine3f& transform,
+                                   InputArray oldPoints, InputArray oldNormals,
+                                   InputArray newPoints, InputArray newNormals
+                                   ) const = 0;
    virtual ~ICP() { }
@@ -30,8 +33,7 @@ protected:
    cv::kinfu::Intr intrinsics;
 };
-cv::Ptr<ICP> makeICP(cv::kinfu::Params::PlatformType t,
+cv::Ptr<ICP> makeICP(const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations,
-                     const cv::kinfu::Intr _intrinsics, const std::vector<int> &_iterations,
                     float _angleThreshold, float _distanceThreshold);
 } // namespace kinfu

--- a/modules/rgbd/src/kinfu.cpp
+++ b/modules/rgbd/src/kinfu.cpp
@@ -12,45 +12,10 @@
 namespace cv {
 namespace kinfu {
-class KinFu::KinFuImpl : public KinFu
-{
-public:
-    KinFuImpl(const Params& _params);
-    virtual ~KinFuImpl();
-    const Params& getParams() const CV_OVERRIDE;
-    void setParams(const Params&) CV_OVERRIDE;
-    void render(OutputArray image, const Matx44f& cameraPose) const CV_OVERRIDE;
-    void getCloud(OutputArray points, OutputArray normals) const CV_OVERRIDE;
-    void getPoints(OutputArray points) const CV_OVERRIDE;
-    void getNormals(InputArray points, OutputArray normals) const CV_OVERRIDE;
-    void reset() CV_OVERRIDE;
-    const Affine3f getPose() const CV_OVERRIDE;
-    bool update(InputArray depth) CV_OVERRIDE;
-private:
-    Params params;
-    cv::Ptr<FrameGenerator> frameGenerator;
-    cv::Ptr<ICP> icp;
-    cv::Ptr<TSDFVolume> volume;
-    int frameCounter;
-    Affine3f pose;
-    cv::Ptr<Frame> frame;
-};
 Ptr<Params> Params::defaultParams()
 {
    Params p;
-    p.platform = PLATFORM_CPU;
    p.frameSize = Size(640, 480);
    float fx, fy, cx, cy;
@@ -88,12 +53,13 @@ Ptr<Params> Params::defaultParams()
    p.tsdf_min_camera_movement = 0.f; //meters, disabled
-    p.volumeDims = 512;  //number of voxels
+    p.volumeDims = Vec3i::all(512); //number of voxels
-    p.volumeSize = 3.f;  //meters
+    float volSize = 3.f;
+    p.voxelSize = volSize/512.f; //meters
    // default pose of volume cube
-    p.volumePose = Affine3f().translate(Vec3f(-p.volumeSize/2, -p.volumeSize/2, 0.5f));
+    p.volumePose = Affine3f().translate(Vec3f(-volSize/2.f, -volSize/2.f, 0.5f));
    p.tsdf_trunc_dist = 0.04f; //meters;
    p.tsdf_max_weight = 64;   //frames
@@ -129,61 +95,138 @@ Ptr<Params> Params::coarseParams()
    }
    p->pyramidLevels = (int)p->icpIterations.size();
-    p->volumeDims = 128; //number of voxels
+    float volSize = 3.f;
+    p->volumeDims = Vec3i::all(128); //number of voxels
+    p->voxelSize  = volSize/128.f;
    p->raycast_step_factor = 0.75f;  //in voxel sizes
    return p;
 }
-KinFu::KinFuImpl::KinFuImpl(const Params &_params) :
+// T should be Mat or UMat
+template< typename T >
+class KinFuImpl : public KinFu
+{
+public:
+    KinFuImpl(const Params& _params);
+    virtual ~KinFuImpl();
+    const Params& getParams() const CV_OVERRIDE;
+    void render(OutputArray image, const Matx44f& cameraPose) const CV_OVERRIDE;
+    void getCloud(OutputArray points, OutputArray normals) const CV_OVERRIDE;
+    void getPoints(OutputArray points) const CV_OVERRIDE;
+    void getNormals(InputArray points, OutputArray normals) const CV_OVERRIDE;
+    void reset() CV_OVERRIDE;
+    const Affine3f getPose() const CV_OVERRIDE;
+    bool update(InputArray depth) CV_OVERRIDE;
+    bool updateT(const T& depth);
+private:
+    Params params;
+    cv::Ptr<ICP> icp;
+    cv::Ptr<TSDFVolume> volume;
+    int frameCounter;
+    Affine3f pose;
+    std::vector<T> pyrPoints;
+    std::vector<T> pyrNormals;
+};
+template< typename T >
+KinFuImpl<T>::KinFuImpl(const Params &_params) :
    params(_params),
-    frameGenerator(makeFrameGenerator(params.platform)),
+    icp(makeICP(params.intr, params.icpIterations, params.icpAngleThresh, params.icpDistThresh)),
-    icp(makeICP(params.platform, params.intr, params.icpIterations, params.icpAngleThresh, params.icpDistThresh)),
+    volume(makeTSDFVolume(params.volumeDims, params.voxelSize, params.volumePose,
-    volume(makeTSDFVolume(params.platform, params.volumeDims, params.volumeSize, params.volumePose,
                          params.tsdf_trunc_dist, params.tsdf_max_weight,
                          params.raycast_step_factor)),
-    frame()
+    pyrPoints(), pyrNormals()
 {
    reset();
 }
-void KinFu::KinFuImpl::reset()
+template< typename T >
+void KinFuImpl<T>::reset()
 {
    frameCounter = 0;
    pose = Affine3f::Identity();
    volume->reset();
 }
-KinFu::KinFuImpl::~KinFuImpl()
+template< typename T >
-{
+KinFuImpl<T>::~KinFuImpl()
+{ }
+template< typename T >
+const Params& KinFuImpl<T>::getParams() const
+{
+    return params;
 }
-const Params& KinFu::KinFuImpl::getParams() const
+template< typename T >
+const Affine3f KinFuImpl<T>::getPose() const
 {
-    return params;
+    return pose;
 }
-void KinFu::KinFuImpl::setParams(const Params& p)
+template<>
+bool KinFuImpl<Mat>::update(InputArray _depth)
 {
-    params = p;
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+    Mat depth;
+    if(_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getMat());
+    }
 }
-const Affine3f KinFu::KinFuImpl::getPose() const
+template<>
+bool KinFuImpl<UMat>::update(InputArray _depth)
 {
-    return pose;
+    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+    UMat depth;
+    if(!_depth.isUMat())
+    {
+        _depth.copyTo(depth);
+        return updateT(depth);
+    }
+    else
+    {
+        return updateT(_depth.getUMat());
+    }
 }
-bool KinFu::KinFuImpl::update(InputArray _depth)
+template< typename T >
+bool KinFuImpl<T>::updateT(const T& _depth)
 {
-    ScopeTime st("kinfu update");
+    CV_TRACE_FUNCTION();
-    CV_Assert(!_depth.empty() && _depth.size() == params.frameSize);
+    T depth;
+    if(_depth.type() != DEPTH_TYPE)
+        _depth.convertTo(depth, DEPTH_TYPE);
+    else
+        depth = _depth;
-    cv::Ptr<Frame> newFrame = (*frameGenerator)();
+    std::vector<T> newPoints, newNormals;
-    (*frameGenerator)(newFrame, _depth,
+    makeFrameFromDepth(depth, newPoints, newNormals, params.intr,
-                      params.intr,
                       params.pyramidLevels,
                       params.depthFactor,
                       params.bilateral_sigma_depth,
@@ -193,14 +236,15 @@ bool KinFu::KinFuImpl::update(InputArray _depth)
    if(frameCounter == 0)
    {
        // use depth instead of distance
-        volume->integrate(newFrame, params.depthFactor, pose, params.intr);
+        volume->integrate(depth, params.depthFactor, pose, params.intr);
-        frame = newFrame;
+        pyrPoints  = newPoints;
+        pyrNormals = newNormals;
    }
    else
    {
        Affine3f affine;
-        bool success = icp->estimateTransform(affine, frame, newFrame);
+        bool success = icp->estimateTransform(affine, pyrPoints, pyrNormals, newPoints, newNormals);
        if(!success)
            return false;
@@ -212,12 +256,15 @@ bool KinFu::KinFuImpl::update(InputArray _depth)
        if((rnorm + tnorm)/2 >= params.tsdf_min_camera_movement)
        {
            // use depth instead of distance
-            volume->integrate(newFrame, params.depthFactor, pose, params.intr);
+            volume->integrate(depth, params.depthFactor, pose, params.intr);
        }
-        // raycast and build a pyramid of points and normals
+        T& points  = pyrPoints [0];
-        volume->raycast(pose, params.intr, params.frameSize,
+        T& normals = pyrNormals[0];
-                        params.pyramidLevels, frameGenerator, frame);
+        volume->raycast(pose, params.intr, params.frameSize, points, normals);
+        // build a pyramid of points and normals
+        buildPyramidPointsNormals(points, normals, pyrPoints, pyrNormals,
+                                  params.pyramidLevels);
    }
    frameCounter++;
@@ -225,9 +272,10 @@ bool KinFu::KinFuImpl::update(InputArray _depth)
 }
-void KinFu::KinFuImpl::render(OutputArray image, const Matx44f& _cameraPose) const
+template< typename T >
+void KinFuImpl<T>::render(OutputArray image, const Matx44f& _cameraPose) const
 {
-    ScopeTime st("kinfu render");
+    CV_TRACE_FUNCTION();
    Affine3f cameraPose(_cameraPose);
@@ -235,40 +283,58 @@ void KinFu::KinFuImpl::render(OutputArray image, const Matx44f& _cameraPose) con
    if((cameraPose.rotation() == pose.rotation() && cameraPose.translation() == pose.translation()) ||
       (cameraPose.rotation() == id.rotation()   && cameraPose.translation() == id.translation()))
    {
-        frame->render(image, 0, params.lightPose);
+        renderPointsNormals(pyrPoints[0], pyrNormals[0], image, params.lightPose);
    }
    else
    {
-        // raycast and build a pyramid of points and normals
+        T points, normals;
-        cv::Ptr<Frame> f = (*frameGenerator)();
+        volume->raycast(cameraPose, params.intr, params.frameSize, points, normals);
-        volume->raycast(cameraPose, params.intr, params.frameSize,
+        renderPointsNormals(points, normals, image, params.lightPose);
-                        params.pyramidLevels, frameGenerator, f);
-        f->render(image, 0, params.lightPose);
    }
 }
-void KinFu::KinFuImpl::getCloud(OutputArray p, OutputArray n) const
+template< typename T >
+void KinFuImpl<T>::getCloud(OutputArray p, OutputArray n) const
 {
    volume->fetchPointsNormals(p, n);
 }
-void KinFu::KinFuImpl::getPoints(OutputArray points) const
+template< typename T >
+void KinFuImpl<T>::getPoints(OutputArray points) const
 {
    volume->fetchPointsNormals(points, noArray());
 }
-void KinFu::KinFuImpl::getNormals(InputArray points, OutputArray normals) const
+template< typename T >
+void KinFuImpl<T>::getNormals(InputArray points, OutputArray normals) const
 {
    volume->fetchNormals(points, normals);
 }
 // importing class
-Ptr<KinFu> KinFu::create(const Ptr<Params>& _params)
+#ifdef OPENCV_ENABLE_NONFREE
+Ptr<KinFu> KinFu::create(const Ptr<Params>& params)
+{
+#ifdef HAVE_OPENCL
+    if(cv::ocl::isOpenCLActivated())
+        return makePtr< KinFuImpl<UMat> >(*params);
+#endif
+    return makePtr< KinFuImpl<Mat> >(*params);
+}
+#else
+Ptr<KinFu> KinFu::create(const Ptr<Params>& /*params*/)
 {
-    return makePtr<KinFu::KinFuImpl>(*_params);
+    CV_Error(Error::StsNotImplemented,
+             "This algorithm is patented and is excluded in this configuration; "
+             "Set OPENCV_ENABLE_NONFREE CMake option and rebuild the library");
 }
+#endif
 KinFu::~KinFu() {}

--- a/modules/rgbd/src/kinfu_frame.cpp
+++ b/modules/rgbd/src/kinfu_frame.cpp
--- a/modules/rgbd/src/kinfu_frame.hpp
+++ b/modules/rgbd/src/kinfu_frame.hpp
@@ -70,52 +70,24 @@ inline ptype toPtype(const cv::Vec3f& x)
    return ptype(x[0], x[1], x[2], 0);
 }
-typedef cv::Mat_< ptype > Points;
+enum
-typedef Points Normals;
-typedef cv::Mat_< depthType > Depth;
-struct Frame
 {
-public:
+    DEPTH_TYPE = DataType<depthType>::type,
-    virtual void render(cv::OutputArray image, int level, cv::Affine3f lightPose) const = 0;
+    POINT_TYPE = DataType<ptype    >::type
-    virtual void getDepth(cv::OutputArray depth) const = 0;
-    virtual ~Frame() { }
 };
-struct FrameCPU : Frame
+typedef cv::Mat_< ptype > Points;
-{
+typedef Points Normals;
-public:
-    FrameCPU() : points(), normals() { }
-    virtual ~FrameCPU() { }
-    virtual void render(cv::OutputArray image, int level, cv::Affine3f lightPose) const override;
-    virtual void getDepth(cv::OutputArray depth) const override;
-    std::vector<Points> points;
-    std::vector<Normals> normals;
-    Depth depthData;
-};
-struct FrameGPU : Frame
-{
-public:
-    virtual void render(cv::OutputArray image, int level, cv::Affine3f lightPose) const override;
-    virtual void getDepth(cv::OutputArray depth) const override;
-    virtual ~FrameGPU() { }
-};
-struct FrameGenerator
+typedef cv::Mat_< depthType > Depth;
-{
-public:
-    virtual cv::Ptr<Frame> operator ()() const = 0;
-    virtual void operator() (cv::Ptr<Frame> frame, cv::InputArray depth, const cv::kinfu::Intr, int levels, float depthFactor,
-                                       float sigmaDepth, float sigmaSpatial, int kernelSize) const = 0;
-    virtual void operator() (cv::Ptr<Frame> frame, cv::InputArray points, cv::InputArray normals, int levels) const = 0;
-    virtual ~FrameGenerator() {}
-};
-cv::Ptr<FrameGenerator> makeFrameGenerator(cv::kinfu::Params::PlatformType t);
+void renderPointsNormals(InputArray _points, InputArray _normals, OutputArray image, cv::Affine3f lightPose);
+void makeFrameFromDepth(InputArray depth, OutputArray pyrPoints, OutputArray pyrNormals,
+                        const Intr intr, int levels, float depthFactor,
+                        float sigmaDepth, float sigmaSpatial, int kernelSize);
+void buildPyramidPointsNormals(InputArray _points, InputArray _normals,
+                               OutputArrayOfArrays pyrPoints, OutputArrayOfArrays pyrNormals,
+                               int levels);
 } // namespace kinfu
 } // namespace cv

--- a/modules/rgbd/src/odometry.cpp
+++ b/modules/rgbd/src/odometry.cpp
@@ -1490,21 +1490,10 @@ Size FastICPOdometry::prepareFrameCache(Ptr<OdometryFrame>& frame, int cacheType
    checkDepth(frame->depth, frame->depth.size());
    // mask isn't used by FastICP
-    Ptr<FrameGenerator> fg = makeFrameGenerator(Params::PlatformType::PLATFORM_CPU);
-    Ptr<FrameCPU> f = (*fg)().dynamicCast<FrameCPU>();
    Intr intr(cameraMatrix);
    float depthFactor = 1.f; // user should rescale depth manually
-    (*fg)(f, frame->depth, intr, (int)iterCounts.total(), depthFactor,
+    makeFrameFromDepth(frame->depth, frame->pyramidCloud, frame->pyramidNormals, intr, (int)iterCounts.total(),
-          sigmaDepth, sigmaSpatial, kernelSize);
+                       depthFactor, sigmaDepth, sigmaSpatial, kernelSize);
-    frame->pyramidCloud.clear();
-    frame->pyramidNormals.clear();
-    for(size_t i = 0; i < f->points.size(); i++)
-    {
-        frame->pyramidCloud.push_back(f->points[i]);
-        frame->pyramidNormals.push_back(f->normals[i]);
-    }
    return frame->depth.size();
 }
@@ -1526,22 +1515,16 @@ bool FastICPOdometry::computeImpl(const Ptr<OdometryFrame>& srcFrame,
 {
    kinfu::Intr intr(cameraMatrix);
    std::vector<int> iterations = iterCounts;
-    Ptr<kinfu::ICP> icp = kinfu::makeICP(kinfu::Params::PlatformType::PLATFORM_CPU,
+    Ptr<kinfu::ICP> icp = kinfu::makeICP(intr,
-                                         intr,
                                         iterations,
                                         angleThreshold,
                                         maxDistDiff);
+    // KinFu's ICP calculates transformation from new frame to old one (src to dst)
    Affine3f transform;
-    Ptr<FrameCPU> srcF = makePtr<FrameCPU>(), dstF = makePtr<FrameCPU>();
+    bool result = icp->estimateTransform(transform,
-    for(size_t i = 0; i < srcFrame->pyramidCloud.size(); i++)
+                                         dstFrame->pyramidCloud, dstFrame->pyramidNormals,
-    {
+                                         srcFrame->pyramidCloud, srcFrame->pyramidNormals);
-        srcF-> points.push_back(srcFrame->pyramidCloud  [i]);
-        srcF->normals.push_back(srcFrame->pyramidNormals[i]);
-        dstF-> points.push_back(dstFrame->pyramidCloud  [i]);
-        dstF->normals.push_back(dstFrame->pyramidNormals[i]);
-    }
-    bool result = icp->estimateTransform(transform, dstF, srcF);
    Rt.create(Size(4, 4), CV_64FC1);
    Mat(Matx44d(transform.matrix)).copyTo(Rt.getMat());

--- a/modules/rgbd/src/opencl/icp.cl
+++ b/modules/rgbd/src/opencl/icp.cl
+// This file is part of OpenCV project.
+// It is subject to the license terms in the LICENSE file found in the top-level directory
+// of this distribution and at http://opencv.org/license.html
+// This code is also subject to the license terms in the LICENSE_KinectFusion.md file found in this module's directory
+#define UTSIZE 27
+typedef float4 ptype;
+/*
+    Calculate an upper triangle of Ab matrix then reduce it across workgroup
+*/
+inline void calcAb7(__global const char * oldPointsptr,
+                    int oldPoints_step, int oldPoints_offset,
+                    __global const char * oldNormalsptr,
+                    int oldNormals_step, int oldNormals_offset,
+                    const int2 oldSize,
+                    __global const char * newPointsptr,
+                    int newPoints_step, int newPoints_offset,
+                    __global const char * newNormalsptr,
+                    int newNormals_step, int newNormals_offset,
+                    const int2 newSize,
+                    const float16 poseMatrix,
+                    const float2 fxy,
+                    const float2 cxy,
+                    const float sqDistanceThresh,
+                    const float minCos,
+                    float* ab7
+                    )
+{
+    const int x = get_global_id(0);
+    const int y = get_global_id(1);
+    if(x >= newSize.x || y >= newSize.y)
+        return;
+    // coord-independent constants
+    const float3 poseRot0 = poseMatrix.s012;
+    const float3 poseRot1 = poseMatrix.s456;
+    const float3 poseRot2 = poseMatrix.s89a;
+    const float3 poseTrans = poseMatrix.s37b;
+    const float2 oldEdge = (float2)(oldSize.x - 1, oldSize.y - 1);
+    __global const ptype* newPtsRow = (__global const ptype*)(newPointsptr +
+                                                              newPoints_offset +
+                                                              y*newPoints_step);
+    __global const ptype* newNrmRow = (__global const ptype*)(newNormalsptr +
+                                                              newNormals_offset +
+                                                              y*newNormals_step);
+    float3 newP = newPtsRow[x].xyz;
+    float3 newN = newNrmRow[x].xyz;
+    if(any(isnan(newP)) || any(isnan(newN)))
+        return;
+    //transform to old coord system
+    newP = (float3)(dot(newP, poseRot0),
+                    dot(newP, poseRot1),
+                    dot(newP, poseRot2)) + poseTrans;
+    newN = (float3)(dot(newN, poseRot0),
+                    dot(newN, poseRot1),
+                    dot(newN, poseRot2));
+    //find correspondence by projecting the point
+    float2 oldCoords = (newP.xy/newP.z)*fxy+cxy;
+    if(!(all(oldCoords >= 0.f) && all(oldCoords < oldEdge)))
+        return;
+    // bilinearly interpolate oldPts and oldNrm under oldCoords point
+    float3 oldP, oldN;
+    float2 ip = floor(oldCoords);
+    float2 t = oldCoords - ip;
+    int xi = ip.x, yi = ip.y;
+    __global const ptype* prow0 = (__global const ptype*)(oldPointsptr +
+                                                          oldPoints_offset +
+                                                          (yi+0)*oldPoints_step);
+    __global const ptype* prow1 = (__global const ptype*)(oldPointsptr +
+                                                          oldPoints_offset +
+                                                          (yi+1)*oldPoints_step);
+    float3 p00 = prow0[xi+0].xyz;
+    float3 p01 = prow0[xi+1].xyz;
+    float3 p10 = prow1[xi+0].xyz;
+    float3 p11 = prow1[xi+1].xyz;
+    // NaN check is done later
+    __global const ptype* nrow0 = (__global const ptype*)(oldNormalsptr +
+                                                          oldNormals_offset +
+                                                          (yi+0)*oldNormals_step);
+    __global const ptype* nrow1 = (__global const ptype*)(oldNormalsptr +
+                                                          oldNormals_offset +
+                                                          (yi+1)*oldNormals_step);
+    float3 n00 = nrow0[xi+0].xyz;
+    float3 n01 = nrow0[xi+1].xyz;
+    float3 n10 = nrow1[xi+0].xyz;
+    float3 n11 = nrow1[xi+1].xyz;
+    // NaN check is done later
+    float3 p0 = mix(p00, p01, t.x);
+    float3 p1 = mix(p10, p11, t.x);
+    oldP = mix(p0, p1, t.y);
+    float3 n0 = mix(n00, n01, t.x);
+    float3 n1 = mix(n10, n11, t.x);
+    oldN = mix(n0, n1, t.y);
+    if(any(isnan(oldP)) || any(isnan(oldN)))
+        return;
+    //filter by distance
+    float3 diff = newP - oldP;
+    if(dot(diff, diff) > sqDistanceThresh)
+        return;
+    //filter by angle
+    if(fabs(dot(newN, oldN)) < minCos)
+        return;
+    // build point-wise vector ab = [ A | b ]
+    float3 VxN = cross(newP, oldN);
+    float ab[7] = {VxN.x, VxN.y, VxN.z, oldN.x, oldN.y, oldN.z, -dot(oldN, diff)};
+    for(int i = 0; i < 7; i++)
+        ab7[i] = ab[i];
+}
+__kernel void getAb(__global const char * oldPointsptr,
+                    int oldPoints_step, int oldPoints_offset,
+                    __global const char * oldNormalsptr,
+                    int oldNormals_step, int oldNormals_offset,
+                    const int2 oldSize,
+                    __global const char * newPointsptr,
+                    int newPoints_step, int newPoints_offset,
+                    __global const char * newNormalsptr,
+                    int newNormals_step, int newNormals_offset,
+                    const int2 newSize,
+                    const float16 poseMatrix,
+                    const float2 fxy,
+                    const float2 cxy,
+                    const float sqDistanceThresh,
+                    const float minCos,
+                    __local float * reducebuf,
+                    __global char* groupedSumptr,
+                    int groupedSum_step, int groupedSum_offset
+)
+{
+    const int x = get_global_id(0);
+    const int y = get_global_id(1);
+    if(x >= newSize.x || y >= newSize.y)
+        return;
+    const int gx = get_group_id(0);
+    const int gy = get_group_id(1);
+    const int gw = get_num_groups(0);
+    const int gh = get_num_groups(1);
+    const int lx = get_local_id(0);
+    const int ly = get_local_id(1);
+    const int lw = get_local_size(0);
+    const int lh = get_local_size(1);
+    const int lsz = lw*lh;
+    const int lid = lx + ly*lw;
+    float ab[7];
+    for(int i = 0; i < 7; i++)
+        ab[i] = 0;
+    calcAb7(oldPointsptr,
+            oldPoints_step, oldPoints_offset,
+            oldNormalsptr,
+            oldNormals_step, oldNormals_offset,
+            oldSize,
+            newPointsptr,
+            newPoints_step, newPoints_offset,
+            newNormalsptr,
+            newNormals_step, newNormals_offset,
+            newSize,
+            poseMatrix,
+            fxy, cxy,
+            sqDistanceThresh,
+            minCos,
+            ab);
+    // build point-wise upper-triangle matrix [ab^T * ab] w/o last row
+    // which is [A^T*A | A^T*b]
+    // and gather sum
+    __local float* upperTriangle = reducebuf + lid*UTSIZE;
+    int pos = 0;
+    for(int i = 0; i < 6; i++)
+    {
+        for(int j = i; j < 7; j++)
+        {
+            upperTriangle[pos++] = ab[i]*ab[j];
+        }
+    }
+    // reduce upperTriangle to local mem
+    // maxStep = ctz(lsz), ctz isn't supported on CUDA devices
+    const int c = clz(lsz & -lsz);
+    const int maxStep = c ? 31 - c : c;
+    for(int nstep = 1; nstep <= maxStep; nstep++)
+    {
+        if(lid % (1 << nstep) == 0)
+        {
+            __local float* rto   = reducebuf + UTSIZE*lid;
+            __local float* rfrom = reducebuf + UTSIZE*(lid+(1 << (nstep-1)));
+            for(int i = 0; i < UTSIZE; i++)
+                rto[i] += rfrom[i];
+        }
+        barrier(CLK_LOCAL_MEM_FENCE);
+    }
+    // here group sum should be in reducebuf[0...UTSIZE]
+    if(lid == 0)
+    {
+        __global float* groupedRow = (__global float*)(groupedSumptr +
+                                                       groupedSum_offset +
+                                                       gy*groupedSum_step);
+        for(int i = 0; i < UTSIZE; i++)
+            groupedRow[gx*UTSIZE + i] = reducebuf[i];
+    }
+}
--- a/modules/rgbd/src/opencl/kinfu_frame.cl
+++ b/modules/rgbd/src/opencl/kinfu_frame.cl
--- a/modules/rgbd/src/opencl/tsdf.cl
+++ b/modules/rgbd/src/opencl/tsdf.cl
--- a/modules/rgbd/src/precomp.hpp
+++ b/modules/rgbd/src/precomp.hpp
@@ -13,6 +13,8 @@
 #include "opencv2/core/utility.hpp"
 #include "opencv2/core/private.hpp"
 #include "opencv2/core/hal/intrin.hpp"
+#include "opencv2/core/ocl.hpp"
+#include "opencl_kernels_rgbd.hpp"
 #include "opencv2/imgproc.hpp"
 #include "opencv2/calib3d.hpp"
 #include "opencv2/rgbd.hpp"

--- a/modules/rgbd/src/tsdf.cpp
+++ b/modules/rgbd/src/tsdf.cpp
--- a/modules/rgbd/src/tsdf.hpp
+++ b/modules/rgbd/src/tsdf.hpp
@@ -18,12 +18,12 @@ class TSDFVolume
 {
 public:
    // dimension in voxels, size in meters
-    TSDFVolume(int _res, float _size, cv::Affine3f _pose, float _truncDist, int _maxWeight,
+    TSDFVolume(Point3i _res, float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
-               float _raycastStepFactor);
+               float _raycastStepFactor, bool zFirstMemOrder = true);
-    virtual void integrate(cv::Ptr<Frame> depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics) = 0;
+    virtual void integrate(InputArray _depth, float depthFactor, cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics) = 0;
-    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize, int pyramidLevels,
+    virtual void raycast(cv::Affine3f cameraPose, cv::kinfu::Intr intrinsics, cv::Size frameSize,
-                         cv::Ptr<FrameGenerator> frameGenerator, cv::Ptr<Frame> frame) const = 0;
+                         cv::OutputArray points, cv::OutputArray normals) const = 0;
    virtual void fetchPointsNormals(cv::OutputArray points, cv::OutputArray normals) const = 0;
    virtual void fetchNormals(cv::InputArray points, cv::OutputArray _normals) const = 0;
@@ -32,14 +32,20 @@ public:
    virtual ~TSDFVolume() { }
-    float edgeSize;
+    float voxelSize;
-    int edgeResolution;
+    float voxelSizeInv;
+    Point3i volResolution;
    int maxWeight;
    cv::Affine3f pose;
+    float raycastStepFactor;
+    Point3f volSize;
+    float truncDist;
+    Vec4i volDims;
+    Vec8i neighbourCoords;
 };
-cv::Ptr<TSDFVolume> makeTSDFVolume(cv::kinfu::Params::PlatformType t,
+cv::Ptr<TSDFVolume> makeTSDFVolume(Point3i _res,  float _voxelSize, cv::Affine3f _pose, float _truncDist, int _maxWeight,
-                                   int _res, float _size, cv::Affine3f _pose, float _truncDist, int _maxWeight,
                                   float _raycastStepFactor);
 } // namespace kinfu

--- a/modules/rgbd/src/utils.cpp
+++ b/modules/rgbd/src/utils.cpp
@@ -50,38 +50,6 @@ namespace rgbd
 namespace kinfu {
-#if PRINT_TIME
-ScopeTime::ScopeTime(std::string name_, bool _enablePrint) :
-    name(name_), enablePrint(_enablePrint)
-{
-    start = (double)cv::getTickCount();
-    nested++;
-}
-ScopeTime::~ScopeTime()
-{
-    double time_ms =  ((double)cv::getTickCount() - start)*1000.0/cv::getTickFrequency();
-    if(enablePrint)
-    {
-        std::string spaces(nested, '-');
-        std::cout << spaces << "Time(" << name << ") = " << time_ms << " ms" << std::endl;
-    }
-    nested--;
-}
-int ScopeTime::nested  = 0;
-#else
-ScopeTime::ScopeTime(std::string /*name_*/, bool /*_enablePrint = true*/)
-{ }
-ScopeTime::~ScopeTime()
-{ }
-#endif
 } // namespace kinfu
 } // namespace cv
--- a/modules/rgbd/src/utils.hpp
+++ b/modules/rgbd/src/utils.hpp
@@ -47,14 +47,14 @@ rescaleDepthTemplated<double>(const Mat& in, Mat& out)
 namespace kinfu {
-// Print execution times of each block marked with ScopeTime
+// One place to turn intrinsics on and off
-#define PRINT_TIME 0
+#define USE_INTRINSICS CV_SIMD128
 typedef float depthType;
 const float qnan = std::numeric_limits<float>::quiet_NaN();
 const cv::Vec3f nan3(qnan, qnan, qnan);
-#if CV_SIMD128
+#if USE_INTRINSICS
 const cv::v_float32x4 nanv(qnan, qnan, qnan, qnan);
 #endif
@@ -63,26 +63,13 @@ inline bool isNaN(cv::Point3f p)
    return (cvIsNaN(p.x) || cvIsNaN(p.y) || cvIsNaN(p.z));
 }
-#if CV_SIMD128
+#if USE_INTRINSICS
 static inline bool isNaN(const cv::v_float32x4& p)
 {
    return cv::v_check_any(p != p);
 }
 #endif
-struct ScopeTime
-{
-    ScopeTime(std::string name_, bool _enablePrint = true);
-    ~ScopeTime();
-#if PRINT_TIME
-    static int nested;
-    const std::string name;
-    const bool enablePrint;
-    double start;
-#endif
-};
 /** @brief Camera intrinsics */
 struct Intr
 {
@@ -143,6 +130,16 @@ struct Intr
    float fx, fy, cx, cy;
 };
+inline size_t roundDownPow2(size_t x)
+{
+    size_t shift = 0;
+    while(x != 0)
+    {
+        shift++; x >>= 1;
+    }
+    return (size_t)(1ULL << (shift-1));
+}
 } // namespace kinfu
 } // namespace cv

--- a/modules/rgbd/test/test_kinfu.cpp
+++ b/modules/rgbd/test/test_kinfu.cpp
@@ -90,10 +90,18 @@ struct RenderInvoker : ParallelLoopBody
    float depthFactor;
 };
-struct CubeSpheresScene
+struct Scene
+{
+    virtual ~Scene() {}
+    static Ptr<Scene> create(int nScene, Size sz, Matx33f _intr, float _depthFactor);
+    virtual Mat depth(Affine3f pose) = 0;
+    virtual std::vector<Affine3f> getPoses() = 0;
+};
+struct CubeSpheresScene : Scene
 {
    const int framesPerCycle = 32;
-    const int nCycles = 1;
+    const float nCycles = 0.25f;
    const Affine3f startPose = Affine3f(Vec3f(-0.5f, 0.f, 0.f), Vec3f(2.1f, 1.4f, -2.1f));
    CubeSpheresScene(Size sz, Matx33f _intr, float _depthFactor) :
@@ -125,7 +133,7 @@ struct CubeSpheresScene
        return res;
    }
-    Mat depth(Affine3f pose)
+    Mat depth(Affine3f pose) override
    {
        Mat_<float> frame(frameSize);
        Reprojector reproj(intr);
@@ -136,10 +144,10 @@ struct CubeSpheresScene
        return frame;
    }
-    std::vector<Affine3f> getPoses()
+    std::vector<Affine3f> getPoses() override
    {
        std::vector<Affine3f> poses;
-        for(int i = 0; i < framesPerCycle*nCycles; i++)
+        for(int i = 0; i < (int)(framesPerCycle*nCycles); i++)
        {
            float angle = (float)(CV_2PI*i/framesPerCycle);
            Affine3f pose;
@@ -160,10 +168,10 @@ struct CubeSpheresScene
 };
-struct RotatingScene
+struct RotatingScene : Scene
 {
-    const int framesPerCycle = 64;
+    const int framesPerCycle = 32;
-    const int nCycles = 1;
+    const float nCycles = 0.5f;
    const Affine3f startPose = Affine3f(Vec3f(-1.f, 0.f, 0.f), Vec3f(1.5f, 2.f, -1.5f));
    RotatingScene(Size sz, Matx33f _intr, float _depthFactor) :
@@ -221,7 +229,7 @@ struct RotatingScene
        return res;
    }
-    Mat depth(Affine3f pose)
+    Mat depth(Affine3f pose) override
    {
        Mat_<float> frame(frameSize);
        Reprojector reproj(intr);
@@ -232,7 +240,7 @@ struct RotatingScene
        return frame;
    }
-    std::vector<Affine3f> getPoses()
+    std::vector<Affine3f> getPoses() override
    {
        std::vector<Affine3f> poses;
        for(int i = 0; i < framesPerCycle*nCycles; i++)
@@ -258,24 +266,42 @@ struct RotatingScene
 Mat_<float> RotatingScene::randTexture(256, 256);
+Ptr<Scene> Scene::create(int nScene, Size sz, Matx33f _intr, float _depthFactor)
+{
+    if(nScene == 0)
+        return makePtr<RotatingScene>(sz, _intr, _depthFactor);
+    else
+        return makePtr<CubeSpheresScene>(sz, _intr, _depthFactor);
+}
 static const bool display = false;
-TEST( KinectFusion, lowDense )
+void flyTest(bool hiDense, bool inequal)
 {
-    Ptr<kinfu::Params> params = kinfu::Params::coarseParams();
+    Ptr<kinfu::Params> params;
+    if(hiDense)
+        params = kinfu::Params::defaultParams();
+    else
+        params = kinfu::Params::coarseParams();
+    if(inequal)
+    {
+        params->volumeDims[0] += 32;
+        params->volumeDims[1] -= 32;
+    }
-    RotatingScene scene(params->frameSize, params->intr, params->depthFactor);
+    Ptr<Scene> scene = Scene::create(hiDense, params->frameSize, params->intr, params->depthFactor);
    Ptr<kinfu::KinFu> kf = kinfu::KinFu::create(params);
-    std::vector<Affine3f> poses = scene.getPoses();
+    std::vector<Affine3f> poses = scene->getPoses();
    Affine3f startPoseGT = poses[0], startPoseKF;
    Affine3f pose, kfPose;
    for(size_t i = 0; i < poses.size(); i++)
    {
        pose = poses[i];
-        Mat depth = scene.depth(pose);
+        Mat depth = scene->depth(pose);
        ASSERT_TRUE(kf->update(depth));
@@ -295,46 +321,35 @@ TEST( KinectFusion, lowDense )
        }
    }
-    ASSERT_LT(cv::norm(kfPose.rvec() - pose.rvec()), 0.01);
+    double rvecThreshold = hiDense ? 0.01 : 0.02;
-    ASSERT_LT(cv::norm(kfPose.translation() - pose.translation()), 0.1);
+    ASSERT_LT(cv::norm(kfPose.rvec() - pose.rvec()), rvecThreshold);
+    double poseThreshold = hiDense ? 0.03 : 0.1;
+    ASSERT_LT(cv::norm(kfPose.translation() - pose.translation()), poseThreshold);
 }
-TEST( KinectFusion, highDense )
+TEST( KinectFusion, lowDense )
 {
-    Ptr<kinfu::Params> params = kinfu::Params::defaultParams();
+    flyTest(false, false);
-    CubeSpheresScene scene(params->frameSize, params->intr, params->depthFactor);
+}
-    Ptr<kinfu::KinFu> kf = kinfu::KinFu::create(params);
-    std::vector<Affine3f> poses = scene.getPoses();
-    Affine3f startPoseGT = poses[0], startPoseKF;
-    Affine3f pose, kfPose;
-    for(size_t i = 0; i < poses.size(); i++)
-    {
-        pose = poses[i];
-        Mat depth = scene.depth(pose);
-        ASSERT_TRUE(kf->update(depth));
-        kfPose = kf->getPose();
-        if(i == 0)
-            startPoseKF = kfPose;
-        pose = (  startPoseGT.inv() * pose  )*startPoseKF;
+TEST( KinectFusion, highDense )
+{
+    flyTest(true, false);
+}
-        if(display)
+TEST( KinectFusion, inequal )
-        {
+{
-            imshow("depth", depth*(1.f/params->depthFactor/4.f));
+    flyTest(false, true);
-            Mat rendered;
+}
-            kf->render(rendered);
-            imshow("render", rendered);
-            waitKey(10);
-        }
-    }
-    ASSERT_LT(cv::norm(kfPose.rvec() - pose.rvec()), 0.01);
+#ifdef HAVE_OPENCL
-    ASSERT_LT(cv::norm(kfPose.translation() - pose.translation()), 0.03);
+TEST( KinectFusion, OCL )
+{
+    cv::ocl::setUseOpenCL(false);
+    flyTest(false, false);
+    cv::ocl::setUseOpenCL(true);
+    flyTest(false, false);
 }
+#endif
 }} // namespace
--- a/modules/rgbd/test/test_odometry.cpp
+++ b/modules/rgbd/test/test_odometry.cpp
@@ -235,8 +235,8 @@ void CV_OdometryTest::run(int)
    // On each iteration an input frame is warped using generated transformation.
    // Odometry is run on the following pair: the original frame and the warped one.
    // Comparing a computed transformation with an applied one we compute 2 errors:
-    // better_1time_count - count of poses which error is less than ground thrush pose,
+    // better_1time_count - count of poses which error is less than ground truth pose,
-    // better_5times_count - count of poses which error is 5 times less than ground thrush pose.
+    // better_5times_count - count of poses which error is 5 times less than ground truth pose.
    int iterCount = 100;
    int better_1time_count = 0;
    int better_5times_count = 0;
@@ -267,7 +267,6 @@ void CV_OdometryTest::run(int)
        waitKey();
 #endif
        // compare rotation
        double rdiffnorm = cv::norm(rvec - calcRvec),
               rnorm = cv::norm(rvec);

--- a/modules/rgbd/test/test_precomp.hpp
+++ b/modules/rgbd/test/test_precomp.hpp
@@ -13,6 +13,10 @@
 #include <opencv2/rgbd.hpp>
 #include <opencv2/calib3d.hpp>
+#ifdef HAVE_OPENCL
+#include <opencv2/core/ocl.hpp>
+#endif
 namespace opencv_test {
 using namespace cv::rgbd;
 }