Skip to content
Projects
Groups
Snippets
Help
Loading...
Sign in / Register
Toggle navigation
O
opencv_contrib
Project
Project
Details
Activity
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Packages
Packages
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
submodule
opencv_contrib
Commits
b18e3579
Commit
b18e3579
authored
Jun 16, 2017
by
Maksim Shabunin
Committed by
Vadim Pisarevsky
Jun 16, 2017
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
dnn: fixed GEMM1T AVX2 implementation (#1231)
parent
81283e9d
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
3 additions
and
3 deletions
+3
-3
fully_connected_layer.cpp
modules/dnn/src/layers/fully_connected_layer.cpp
+1
-1
layers_common.avx2.cpp
modules/dnn/src/layers/layers_common.avx2.cpp
+2
-2
No files found.
modules/dnn/src/layers/fully_connected_layer.cpp
View file @
b18e3579
...
@@ -169,7 +169,7 @@ public:
...
@@ -169,7 +169,7 @@ public:
for
(
k
=
0
;
k
<
vecsize
;
k
+=
4
)
for
(
k
=
0
;
k
<
vecsize
;
k
+=
4
)
{
{
vfloat32x4
v
=
v_load
_aligned
(
sptr
+
k
);
vfloat32x4
v
=
v_load
(
sptr
+
k
);
vs0
+=
v
*
v_load_aligned
(
wptr
+
k
);
vs0
+=
v
*
v_load_aligned
(
wptr
+
k
);
vs1
+=
v
*
v_load_aligned
(
wptr
+
wstep
+
k
);
vs1
+=
v
*
v_load_aligned
(
wptr
+
wstep
+
k
);
vs2
+=
v
*
v_load_aligned
(
wptr
+
wstep
*
2
+
k
);
vs2
+=
v
*
v_load_aligned
(
wptr
+
wstep
*
2
+
k
);
...
...
modules/dnn/src/layers/layers_common.avx2.cpp
View file @
b18e3579
...
@@ -204,7 +204,7 @@ void fastGEMM1T_avx2( const float* vec, const float* weights,
...
@@ -204,7 +204,7 @@ void fastGEMM1T_avx2( const float* vec, const float* weights,
for
(
int
k
=
0
;
k
<
vecsize
;
k
+=
8
,
wptr
+=
8
)
for
(
int
k
=
0
;
k
<
vecsize
;
k
+=
8
,
wptr
+=
8
)
{
{
__m256
v
=
_mm256_load_ps
(
vec
+
k
);
__m256
v
=
_mm256_load
u
_ps
(
vec
+
k
);
vs0
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
),
v
,
vs0
);
vs0
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
),
v
,
vs0
);
vs1
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
+
wstep
),
v
,
vs1
);
vs1
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
+
wstep
),
v
,
vs1
);
...
@@ -237,7 +237,7 @@ void fastGEMM1T_avx2( const float* vec, const float* weights,
...
@@ -237,7 +237,7 @@ void fastGEMM1T_avx2( const float* vec, const float* weights,
for
(
int
k
=
0
;
k
<
vecsize
;
k
+=
8
,
wptr
+=
8
)
for
(
int
k
=
0
;
k
<
vecsize
;
k
+=
8
,
wptr
+=
8
)
{
{
__m256
v
=
_mm256_load_ps
(
vec
+
k
);
__m256
v
=
_mm256_load
u
_ps
(
vec
+
k
);
vs0
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
),
v
,
vs0
);
vs0
=
_mm256_fmadd_ps
(
_mm256_load_ps
(
wptr
),
v
,
vs0
);
}
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment