You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
MXNet imperative operator invocation overhead is as large as 30-60us, which is significant compared to the official NumPy operators with ~600ns overhead. This has negatively impacted the performance of applying MXNet to the models where many operators' kernel runtime duration is short, especially in the area of classic machine learning. We plan to address the problem in two steps:
Short term: Use pybind11 to replace Python op API and ctypes/c api. Preliminary experiments show that the pure Python-C++ turnaround time by using Pybind is between 400-600ns, while the current Python op API using ctypes/c api costs more than 10us. We believe with the correct implementation, we can reduce the op invocation overhead to 2us including the time on FFI and engine.
Long term: Adopt Python's C extension interface. NumPy did this by developing its own C API. This provides considerably less overhead compared to other solutions. However, it would cost much more engineering efforts by integrating this with our existing operator workflow in C++.