-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[RFC] Introducing NumPy-compatible coding experience into MXNet #14253
Description
Motivation
Today deep learning scientists spend majority of their time on data processing, debugging tensor algorithms, and tuning model parameters, instead of architecting models from scratch by themselves as a result from the abundant pre-trained models existing in many deep learning model zoos. This has highlighted the usability of tensor APIs as a key factor for a framework to be widely adopted.
MXNet was firstly designed with the focus on memory efficiency, computation throughput and scalability. The usability problems begin to show up nowadays when more and more models demonstrate dynamic natures, e.g. unknown-shape tensors before runtime, control flow depending on a runtime result, etc. Here we highlight the most frequent complaints about usability from users.
- Scalar tensors (aka zero-dim tensors) are not supported. For example, given
a = [0, 1, 2]
,a[1]
will generate anNDArray
of shape(1,)
, instead of()
as in NumPy. - Zero-size tensor is not supported. For example, a tensor of shape
(0, 16, 256)
cannot be passed to an operator, because our system currently treats 0, the first dimension size, as unknown, rather than a concrete number. - Many operators' signatures and functionality are not NumPy compatible, e.g.
nd.dot
vs.np.dot
,nd.concatenate
vs.np.concatenate
, etc. - Many NumPy operators are missing. See the reference link to GitHub issues.
- Operators whose outputs' shapes can only be determined at runtime are not supported, e.g.
data[data < 0]
cannot run. - Diverged programming experience due to the separation of imperative and symbolic operators registered under
mxnet.ndarray
andmxnet.symbol
. - Control flow operators are hard to use. Users have to understand the complicated signatures of control flow operators, instead of writing native Python code using
for
,while
,if/else
, etc.
For example, we have learned (in a hard way) that it does not make a lot of sense to ask users to write code like the following to perform a cumulative sum.
def sum(state, i):
s = state + data[i]
return s, [s, i + 1]
def sum_cond(state, i):
return i < 4
out, state = F.contrib.while_loop(sum_cond, sum, [F.zeros((1)), F.zeros((1))],
max_iterations=5)
Instead, users should be able to just write native Python code as the following and if required, let the framework serialize it into a computation graph for optimization and deployment.
data = np.arange(5)
out = 0
i = 0
while i < 5:
out = out + data[i]
It is not hard to figure out that all of the above pain points can be summarized as a result from lack of NumPy-compatible coding experience in MXNet. While addressing the problems of better support of control flow operators and a consolidated coding style for writing imperative and symbolic code with more flexibility requires introducing fundamental changes into the codebase for building new infrastructures, such as a new graph IR and executor, which is extremely non-trivial and should be executed with a long-term plan, we can, at the moment, improve the usability by fixing the issue of zero-dim/size tensors and implementing NumPy operators in MXNet. Please allow us to discuss how to achieve these short-term goals in the following.
Support of zero-dim and zero-size tensors
What's the problem?
Zero-dim and zero-size tensors are valid tensors in NumPy. The former, whose shapes are ()
, represent scalars in numpy.ndarray
format. The latter, which have one or multiple zero dimension sizes in shapes, can be useful as a placeholder for many ndarray
operations, such as concatenating a zero-size ndarray
with another ndarray
. MXNet does not support them due to the reserved semantics of empty shape ()
and shapes with zero dimension sizes indicating unknown shape information. Such information need to be filled out during the shape inference stage in order to move forward to tensor computations later.
How to resolve the problem?
We can first change the current semantics to comply with NumPy definition.
- Change the definition of unknown shapes from
ndim = 0
tondim = -1
inTShape
class. - Change the definition of unknown dimension sizes from
dim_size = 0
todim_size = -1
inTShape
class.
After this, we need to scan all over the codebase to modify the code accordingly where shape.ndim() == 0
and shape.Size() == 0
is used to perform unknown shape checks.
Please note that although MXNet's shape is a type inheriting from nnvm::Tuple
, which is often used to represent an list-like object, such as axis=(1, 2, 3)
, we will not change the meaning of an empty tuple. This separation of definitions for empty shape and empty tuple keeps the their roles clearly decoupled.
We propose to breakdown the efforts into the following steps.
- Copy
tuple.h
from NNVM to MXNet and renamennvm::TShape
tomxnet::TShape
. - Replace all the places in MXNet where
nnvm::Tuple
andnnvm::TShape
are used withmxnet::Tuple
andmxnet::TShape
, respectively. - Change the definition of
TShape
intuple.h
to usendim = -1
to indicate unknown shapes anddim_size = -1
to indicate unknown shape dim sizes. - Modify all the existing shape inference and utility functions where
ndim == 0
anddim_size == 0
is used to accommodate the above changes. - Modify NNVM passes,
InferShape
,PlanMemory
, andGradient
, wherennvm::TShape
is used, to accommodate the above changes. - Add sufficient unit tests.
How is backward compatibility guaranteed?
By default, we do not change the original definition of output shapes in shape inference functions; we just change ndim==0
to ndim==-1
for unknown shape verification. No backward compatibility issues are expected for all but one case, NDArray
indexing. To elaborate, the current behavior determines that x[i]
always returns a tensor with ndim >= 1
. We can keep the current behavior unchanged and implement a global switch for users to turn on for expecting NumPy-compatible results.
Previous discussion of this topic can be seen here.
Implementation of NumPy operators
What to do?
To address the problems of operator incompatibility with NumPy and alleviate the pain of diverged programming experience due to the operator namespace separation: mxnet.ndarray
and mxnet.symbol
, we propose creating a new namespace mxnet.numpy
, adopting operator APIs from NumPy, and implementing those operator APIs under the namespace. mxnet.numpy
should provide the same imperative programming experience as NumPy and will gradually replace all the non-neural-network operators in the current codebase. While implementing NumPy operators in MXNet, it is possible for us to leverage TVM to generate high-performance kernels (ref.).
Can mxnet.numpy
operators be used in Gluon for hybridization?
The newly implemented NumPy operators can still be accessed through the module (ndarray
/symbol
) delegate F
in Gluon, e.g. F.numpy.dot
. This works because the new operators are still registered under mxnet.ndarray
and mxnet.symbol
behind the scene. It is just that users are encouraged to access NumPy operator APIs through mxnet.numpy
to write pure imperative code and Gluon APIs for achieving hybrid coding experience.
Where to contribute code?
A dev branch has been opened for this proposal.
https://github.com/apache/incubator-mxnet/tree/numpy