Export / import functions to / from a file #1642

awni · 2024-12-03T17:16:59Z

Adds export_function and import_function so that we can save and load functions from a file. Makes it possible to use functions written in one language from another language (e.g. Python -> C++).

Basically works like so:

In Python:

# Note, the model parameters are saved in the export function
# An alternative is to make them inputs to forward
def forward(x):
    return model(x)

example_x = mx.zeros(shape=(batch_size, input_dim))

# Export to file using example input
mx.export_function("model.mlxfn", forward, example_x)

Then in C++, for example:

  auto example_x = random::uniform({batch_size, input_dim});

  // Import the function
  auto forward = import_function("model.mlxfn");

  // Call the imported function
  auto out = forward({example_x})[0];

Some notes on the implementation:

Reuses a lot of the compile infrastructure which simplifies things dramatically
The serialization of everything is mostly decoupled from the rest of the code and kept in the export.cpp
Serializing primitives that have member variables requires some way of accessing them. The API is not opinionated about this (so the primitive interface didn't change at all).. but the convention I'm using is to have a state which returns the data to save
Likely can use templates / preprocessor to reduce more boiler-plate from some of serialization code in export.cpp. But didn't want to obfuscate / over engineer it much yet until getting some input.

angeloskath · 2024-12-03T21:38:52Z

This is massively cool. I 'll get to reviewing asap!

awni · 2024-12-04T21:35:08Z

mlx/compile.cpp

+    // - constants, which can be used directly
+    // - a load primitive which has no inputs and will become a constant
+    //   after the first eval
+    if (!a.has_primitive() || is_load(a.primitive())) {


This change is worth commenting on:

Previously if you loaded arrays from a file inside a compiled function then every call to the function would reload from the file.

Now only the first call to the function loads and after that the loaded arrays become constants in the tape

This seems better to me.. though perhaps that is debatable. It is also used by import_function which makes Load primitives for constants and so we get lazy loading even with import_function which is pretty nice.

Lmk thoughts.. I can switch it so compile doesn't force load (more flexible but more dangerous).

I was a bit torn but the more I think about it the more I like it. If someone wants to control the load, they can always pass the state as inputs.

Managing memory is also as easy as before. All you have to do is delete the function after it is called but before you eval (there is the small overhead of deserializing the function when loading again).

It got me thinking about an optimization we can do in compile in general which is to prune/eval branches of the graph that have no inputs on the leaves. It would be a trade-off for memory and compute so one would need to be careful there.. but there are some legitimate cases that it's come up for me.

For example in some of our RoPE implementations we precompute the self._freqs but they aren't part of the model.parameters(). Naively compiling would recompute those at each call of the function. So you have to make sure you eval them before running the compiled function.

Hmm interesting. It would be fairly easy to write as a part of simplify or a similar operation but how would we provide that functionality? Always doing it doesn't seem like the best option as it might require a lot of memory which the user explicitly doesn't want to keep around. Same goes for the Load indeed.

One way of deciding is doing it based on user expectations. I'm not sure it's technically feasible. But there are two cases that we currently treat the same that I think people have quite different expectations about. Use load as an example:

def fun1(x): return x + mx.load("y.safetensors")["y"] y = mx.load("y.safetensors")["y"] def fun2(x): return x + y

In fun1 I expect the load to happen every time I call the function (as it does in eager mode). In fun2 I expect the load to happen only once.

For compile the behavior used to be like fun1 and I switched it to fun2 just because it's really unusual to write something like fun1.

But in theory we could try and distinguish the two cases. I think the expecatios are the same for any computation:

def fun1(x): return x + complex_fun() y = complex_fun() def fun2(x): return x + y

Getting that behavior in Python is probably doable because we can figure out which inputs are enclosed.. getting it in C++ is probably a lot harder / maybe not possible.

awni · 2024-12-04T21:36:55Z

mlx/export.h

+std::function<std::vector<array>(const std::vector<array>&)> import_function(
+    std::string path);


Another question: should import_function return metadata? I can see how it would be useful to get say the shapes and/or dtypes of the inputs, maybe the MLX version, etc in a dict of metadata. Can also wait and see and provide an overload / a return_metadata flag in the future.

I think it's fine. I am not too sure why we want to put everything in a single file tbh (weights, metadata and computation graph) but either way, as you say, we can always add a return_metadata optional argument.

#2410
requests for adding meta data.

Is there a plan to support it?

awni · 2024-12-04T21:38:28Z

mlx/primitives.h

@@ -2098,21 +2147,6 @@ class Tanh : public UnaryPrimitive {
  void eval(const std::vector<array>& inputs, array& out);
 };

-class Uniform : public UnaryPrimitive {


Unused 🤷‍♂️ ..

awni · 2024-12-23T16:11:56Z

I updated the exporting API to allow functions in C++ which take a vector and/or map of keyword arguments:

So you can do things like:

export_function(file_path, fun, args);
export_function(file_path, fun, kwargs);
export_function(file_path, fun, args, kwargs);

And similarly:

auto fun = import_function(file_path);
fun(args);
fun(args, kwargs);
fun(kwargs);

args is a vector of arrays and kwargs is a map of string keys with array values.

This makes it a lot safe to export functions that take keyword arguments from Python.

awni · 2024-12-23T16:14:32Z

I also updated the API to allow exporting multiple traces of the same function with varying inputs. This is really nice e.g. for LLM inference where prompt prefill takes a mask and sometimes cache but generation does not take a mask. It is also pretty nice for exporting varying shapes (when you aren't doing shapeless exports). Common constants are serialized only once.

For example:

        constant = mx.zeros((16, 2048))
        mx.eval(constant)
        
        def fun(*args):
            return constant + sum(args)
        
        with mx.exporter(path, fun) as exporter:
            for i in range(5):
                exporter(*[mx.array(1)] * i)

The above exports 6 different graphs but only a single copy of constant.

awni · 2024-12-23T16:15:46Z

I think it's safe to review this.. and I probably should not keep growing this diff because it's getting a bit large.

I still want to clean-up some of the implementation but overall I think it gets the job done reasonably well.

awni · 2024-12-23T21:40:41Z

Here is a little git package that exports Llama 3.1 generation from Python and imports and runs it C++: https://gist.github.com/awni/ebd1c9faa0e33c5d924561695c15ac7e

angeloskath

It looks really really good. There is nothing I could find to comment really 🤷‍♂️. There is some amount of domain specific logic in FunctionExporter::export_function but I think this aligns perfectly with the rest of the code. It is more efficient and relatively self-contained similar to eval, vjp etc.

Passing nullptr for the fallback of fast primitives is a bit of a hairy situation. One option would be to export the fallback tape on the provided inputs but a) it is a bit complicated b) unlikely to be useful.

angeloskath · 2024-12-24T00:10:45Z

mlx/compile.cpp

+    // - constants, which can be used directly
+    // - a load primitive which has no inputs and will become a constant
+    //   after the first eval
+    if (!a.has_primitive() || is_load(a.primitive())) {


I was a bit torn but the more I think about it the more I like it. If someone wants to control the load, they can always pass the state as inputs.

Managing memory is also as easy as before. All you have to do is delete the function after it is called but before you eval (there is the small overhead of deserializing the function when loading again).

awni · 2024-12-24T00:20:43Z

Passing nullptr for the fallback of fast primitives is a bit of a hairy situation. One option would be to export the fallback tape on the provided inputs but a) it is a bit complicated b) unlikely to be useful.

I agree that is probably the "correct" option. And I think it probably should be done but like you say it's a bit involved and I didn't want to keep growing this diff. The nice thing of doing it that way is we would be able to transform fast primitives even after export -> import which I think is pretty neat (though maybe not so useful in practice 😅 ).

awni · 2024-12-24T00:24:02Z

Thanks for reading the diff @angeloskath! I know it's a big one!! I'm going to mark the new APIs as experimental in the doc strings (just to give fair warning that they are still quite new and subject to change) and then I think we should merge and refine with some license to change what we need based on usage. I'll also follow up this PR with a usage guide for the docs which we can put it either in developer docs or in our usage section.

angeloskath · 2024-12-24T00:25:34Z

Sounds perfect to me!

awni requested review from angeloskath, jagrit06 and barronalex December 3, 2024 17:24

awni force-pushed the export_import branch from 9c78253 to 1fcfaff Compare December 3, 2024 23:00

awni commented Dec 4, 2024

View reviewed changes

awni force-pushed the export_import branch 3 times, most recently from fd6520c to 454f44c Compare December 12, 2024 17:03

awni force-pushed the export_import branch 5 times, most recently from 65b15ad to 28291dc Compare December 23, 2024 14:40

awni marked this pull request as ready for review December 23, 2024 16:14

awni added 10 commits December 23, 2024 11:28

export and import functions

0a6f18e

refactor + works for few primitives

4a755e2

nit

3fce0aa

allow primitives with state

1e66f52

nit

f46bcc5

nit

7e11f1b

simplify serialize / deserialize

5b3a443

fix for constants

73ac07a

python bindings

baed957

maybe fix serialize failure case

414c3d4

awni added 11 commits December 23, 2024 11:32

add example

47e1684

more primitives, training kind of works

18984bc

same result for python and c++

43993c0

some fixes

0689cca

fix export

c41efc6

template it up

61dc3e1

some simplificatoin

3904c47

rebase

5e92a03

allow kwargs and multiple functions

4609b88

exporter

5114d7a

more primitives for exporting

0afd2f4

awni force-pushed the export_import branch from 28291dc to 0afd2f4 Compare December 23, 2024 19:33

awni added 2 commits December 23, 2024 12:15

deal with endianness

46e9ced

handle invalid stream

3d94722

awni changed the title ~~[WIP] Export / import functions to / from a file~~ Export / import functions to / from a file Dec 23, 2024

angeloskath approved these changes Dec 24, 2024

View reviewed changes

awni force-pushed the export_import branch from 16c6971 to ec82154 Compare December 24, 2024 14:23

add docstring

ab9460c

awni force-pushed the export_import branch from ec82154 to ab9460c Compare December 24, 2024 14:43

awni merged commit 4ba0c24 into main Dec 24, 2024
5 checks passed

awni deleted the export_import branch December 24, 2024 19:19

BrewTestBot mentioned this pull request Jan 9, 2025

mlx 0.22.0 Homebrew/homebrew-core#203783

Merged

1 task

		std::function<std::vector<array>(const std::vector<array>&)> import_function(
		std::string path);

Export / import functions to / from a file #1642

Export / import functions to / from a file #1642

Uh oh!

Conversation

awni commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angeloskath commented Dec 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni Dec 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni commented Dec 23, 2024

Uh oh!

awni commented Dec 23, 2024

Uh oh!

awni commented Dec 23, 2024

Uh oh!

awni commented Dec 23, 2024

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awni commented Dec 24, 2024

Uh oh!

awni commented Dec 24, 2024

Uh oh!

angeloskath commented Dec 24, 2024

Uh oh!

Uh oh!

Uh oh!

awni commented Dec 3, 2024 •

edited

Loading

awni Dec 24, 2024 •

edited

Loading