Skip to content

Conversation

lxwlaq
Copy link
Collaborator

@lxwlaq lxwlaq commented Dec 28, 2021

3×3 s2 direct

conv fp32 fp16 rate
input:1×4×64×64; weight:24×4×3×3 0.0727 0.04976 31.6%
input:1×4×128×128; weight:40×4×3×3 0.40227 0.232 42.2%
input:1×4×256×256; weight:40×4×3×3 1.57216 0.8753 44.5%
input:1×8×320×320; weight:40×8×3×3 4.58387 2.4304 46.9%
conv fp32 fp16 rate
input:1×3×224×224; weight:24×3×3×3 0.76278 0.33987 55.5%
input:1×3×480×480; weight:24×3×3×3 3.605 1.50867 58.3%
input:1×3×256×256; weight:24×3×3×3 0.99461 0.43739 56.5%
input:1×3×320×320; weight:24×3×3×3 1.57998 0.67347 57.3%

5×5 depthwise

conv fp32 fp16 rate
input:1×96×28×28 stride=1 0.22275 0.19064 14.4%
input:1×96×56×56 stride=1 0.85113 0.78724 8.2%
input:1×72x56×56 stride=1 0.63895 0.59116 7.8%
input:1×240×28×28 stride=1 0.50349 0.47392 6.0%
conv fp32 fp16 rate
input:1×96×28×28 stride=2 0.17035 0.07006 58.8%
input:1×96×56×56 stride=2 0.75981 0.27707 63.5%
input:1×72x56×56 stride=2 0.57125 0.20434 64.9%
input:1×240×28×28 stride=2 0.42145 0.18061 57.1%

@paddle-bot-old
Copy link

Thanks for your contribution!

#define INIT_FIRST \
"2:\n" \
"vld1.16 {d10-d13}, [%[wc0]]! @ load w0, w1\n" \
"vld1.16 {d14-d15}, [%[wc0]]! @ load w2\n" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个指令可以插入几条其他指令,减少冲突

#define INIT \
"2:\n" \
"vld1.16 {d10-d13}, [%[wc0]]! @ load w0, w1\n" \
"vld1.16 {d14-d15}, [%[wc0]]! @ load w2\n" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

"vld1.16 {d10-d13}, [%[wc0]]! @ load w0, w1\n" \
"vld1.16 {d14-d15}, [%[wc0]]! @ load w2\n" \
"vld1.16 {d16-d19}, [%[ptr_out0]]! @ load outr0\n" \
"vld1.16 {d20-d23}, [%[ptr_out0]] @ load outr0\n" \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

"vmla.f16 q14, q7, d4[2] @ w0 * inr24\n" \
"vmla.f16 q15, q7, d5[0] @ w0 * inr26\n" \
"vld1.16 {d10-d13}, [%[wc0]]! @ load w5, to q7\n" /* mul r1, with*/ \
"vld1.16 {d14-d15}, [%[wc0]]! @ load w5, to q7\n" /* mul r1, with*/ \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上


#define COMPUTE \
"vld1.16 {d24-d25}, [%[bias]] \n" /* load bias to out00 */ \
"vld1.16 {d0-d3}, [%[wc0]]! \n" /* load w0-w1 */ \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

同上

Copy link
Collaborator

@chenjiaoAngel chenjiaoAngel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chenjiaoAngel chenjiaoAngel merged commit dffa311 into PaddlePaddle:develop Jan 4, 2022
WeiLi233 pushed a commit to WeiLi233/Paddle-Lite that referenced this pull request Mar 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants