Skip to content

Conversation

AtomAlpaca
Copy link
Contributor

添加 BNLL 的 rvv 优化

@github-actions github-actions bot added the riscv label Apr 18, 2025
Copy link

The binary size change of libncnn.so (bytes)

architecture base size pr size difference
x86_64 16465128 16465128 0 😘
armhf 7335260 7335260 0 😘
aarch64 10704800 10704800 0 😘

@codecov-commenter
Copy link

codecov-commenter commented Apr 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.17%. Comparing base (80da741) to head (287f9ce).
Report is 3 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #6005       +/-   ##
===========================================
+ Coverage   95.40%   96.17%    +0.77%     
===========================================
  Files         821      589      -232     
  Lines      268820   142476   -126344     
===========================================
- Hits       256465   137026   -119439     
+ Misses      12355     5450     -6905     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nihui
Copy link
Member

nihui commented Apr 19, 2025

After trying to update the compiler and researching, I found these restrictions under the xtheadvector extension

__riscv_vfsgnjn_vv_f32m8_mu will generate illegal instruction, to be reported

__riscv_vfsgnjn_vv_f32m8_mu and __riscv_vfadd_vv_f32m8_mu u policy in xtheadvector is cleared to 0 instead of maintaining src values, see XUANTIE-RV/xuantie-gnu-toolchain#26

  • workaround 1, neg(abs(x))
#if __riscv_xtheadvector
            vfloat32m8_t _comm = __riscv_vfsgnjx_vv_f32m8(_p, _p, vl);
            _comm = __riscv_vfsgnjn_vv_f32m8(_comm, _comm, vl);
#else
            vfloat32m8_t _comm = __riscv_vfsgnjn_vv_f32m8_mu(_mask, _p, _p, _p, vl);
#endif
  • workaround 2, use vmerge
#if __riscv_xtheadvector
            vfloat32m8_t _res = __riscv_vfadd_vv_f32m8(_comm, _p, vl);
            _res = __riscv_vmerge_vvm_f32m8(_comm, _res, _mask, vl);
#else
            vfloat32m8_t _res = __riscv_vfadd_vv_f32m8_mu(_mask, _comm, _comm, _p, vl);
#endif

@nihui nihui merged commit a4ca440 into Tencent:master Apr 19, 2025
27 of 30 checks passed
@nihui
Copy link
Member

nihui commented Apr 19, 2025

Thanks for your contribution !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants