Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[RFC][mxnet 2.0][item 10.1] MXNet Imperative Op Invocation Overhead #17097

@reminisce

Description

@reminisce

MXNet imperative operator invocation overhead is as large as 30-60us, which is significant compared to the official NumPy operators with ~600ns overhead. This has negatively impacted the performance of applying MXNet to the models where many operators' kernel runtime duration is short, especially in the area of classic machine learning. We plan to address the problem in two steps:

  1. Short term: Use pybind11 to replace Python op API and ctypes/c api. Preliminary experiments show that the pure Python-C++ turnaround time by using Pybind is between 400-600ns, while the current Python op API using ctypes/c api costs more than 10us. We believe with the correct implementation, we can reduce the op invocation overhead to 2us including the time on FFI and engine.

  2. Long term: Adopt Python's C extension interface. NumPy did this by developing its own C API. This provides considerably less overhead compared to other solutions. However, it would cost much more engineering efforts by integrating this with our existing operator workflow in C++.

@hzfan @hgt312

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCPost requesting for comments

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions