Create in-memory large models without serializing large initializers through protobuf #5685

xadupre · 2023-10-18T15:10:49Z

Description

This PR proposes a way to build a large model (> 2Gb) in memory.

Motivation and Context

The API for external data requires to create the whole model and then to call a function to move big initializer to external files. It creates class LargeModelContainer. It holds a ModelProto with no big initializers. The big initiliazers are in an additional dictionary with no protobuf structure (so it can be higher than 2Gb).

As an example, the creation of a large model:

X = make_tensor_value_info("X", TensorProto.FLOAT, [None, None])
Y = make_tensor_value_info("Y", TensorProto.FLOAT, [None])
graph = make_graph(
    [
        make_node("MatMul", ["X", "A"], ["XA"]),
        make_node("MatMul", ["XA", "B"], ["XB"]),
        make_node("MatMul", ["XB", "C"], ["Y"]),
    ],
    "mm",
    [X],
    [Y],
    [
        # first large tensor, only the type and shape are used,
        # the location must start with '#' and be unique.
        make_large_tensor_proto("#loc0", "A", TensorProto.FLOAT, (3, 3)),
        from_array(np.arange(9).astype(np.float32).reshape((-1, 3)), name="B"),
        # second large tensor, only the type and shape are used
        make_large_tensor_proto("#loc1", "C", TensorProto.FLOAT, (3, 3)),
    ],
)
onnx_model = make_model(graph)

# The second parameter is a dictionary mapping the locations (or unique names) to the numpy arrays,
# it could be easily extended to support torch tensor.
large_model = make_large_model(
    onnx_model.graph,
    {
        "#loc1": (np.arange(9) * 100).astype(np.float32).reshape((-1, 3)),
        "#loc2": (np.arange(9) + 10).astype(np.float32).reshape((-1, 3)),
    },
)
large_model.check_model()

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

codecov · 2023-10-18T15:18:47Z

Codecov Report

Attention: 39 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
onnx/helper.py	`64.32% <ø> (ø)`
onnx/test/model_container_test.py	`87.50% <87.50%> (ø)`
onnx/model_container.py	`81.01% <81.01%> (ø)`

📢 Thoughts on this report? Let us know!

github-advanced-security

lintrunner found more than 10 potential problems in the proposed changes. Check the Files changed tab for more details.

onnx/large_helper.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/large_helper.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/large_helper.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/large_helper.py

onnx/test/large_onnx_test.py

onnx/large_helper.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/large_helper.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

gramalingam · 2023-10-27T22:42:55Z

Out of curiosity, does ONNX spec allow a ModelProto/GraphProto referencing a non-proto file as weight external data? Would endianess bring any compatibility issue if the model is, say, serialized in a big-endian system and loaded into a little-endian platform?

Yes and no. The ONNX spec specifies the rawdata format for the external data (so, no endianess in serialized form).

docs/docsgen/source/api/large_helper.md

onnx/model_container.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/model_container.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

onnx/model_container.py

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

### Description ReferenceEvaluator can take any proto as an input. This PR extends the support to ModelContainer introduced in PR #5685. ### Motivation and Context This makes it easier to test. --------- Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>

thiagocrepaldi

Are the model initializers renamed?

thiagocrepaldi · 2024-01-11T17:52:18Z

onnx/model_container.py

+                continue
+
+            info = ext_data.ExternalDataInfo(tensor)
+            file_location = ext_data._sanitize_path(info.location)


Is sanitization needed? I was assuming location was sanitized before saving, so loading should be ok? Maybe the sanitization removes the leading # from location for filesystem file loading purposes?

…through protobuf (onnx#5685) ### Description This PR proposes a way to build a large model (> 2Gb) in memory. ### Motivation and Context The API for external data requires to create the whole model and then to call a function to move big initializer to external files. It creates class `LargeModelContainer`. It holds a ModelProto with no big initializers. The big initiliazers are in an additional dictionary with no protobuf structure (so it can be higher than 2Gb). As an example, the creation of a large model: ```python X = make_tensor_value_info("X", TensorProto.FLOAT, [None, None]) Y = make_tensor_value_info("Y", TensorProto.FLOAT, [None]) graph = make_graph( [ make_node("MatMul", ["X", "A"], ["XA"]), make_node("MatMul", ["XA", "B"], ["XB"]), make_node("MatMul", ["XB", "C"], ["Y"]), ], "mm", [X], [Y], [ # first large tensor, only the type and shape are used, # the location must start with '#' and be unique. make_large_tensor_proto("#loc0", "A", TensorProto.FLOAT, (3, 3)), from_array(np.arange(9).astype(np.float32).reshape((-1, 3)), name="B"), # second large tensor, only the type and shape are used make_large_tensor_proto("#loc1", "C", TensorProto.FLOAT, (3, 3)), ], ) onnx_model = make_model(graph) # The second parameter is a dictionary mapping the locations (or unique names) to the numpy arrays, # it could be easily extended to support torch tensor. large_model = make_large_model( onnx_model.graph, { "#loc1": (np.arange(9) * 100).astype(np.float32).reshape((-1, 3)), "#loc2": (np.arange(9) + 10).astype(np.float32).reshape((-1, 3)), }, ) large_model.check_model() ``` --------- Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: G. Ramalingam <grama@microsoft.com> Signed-off-by: Linsho Kaku <linsho@preferred.jp>

### Description ReferenceEvaluator can take any proto as an input. This PR extends the support to ModelContainer introduced in PR onnx#5685. ### Motivation and Context This makes it easier to test. --------- Signed-off-by: Xavier Dupre <xadupre@microsoft.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Linsho Kaku <linsho@preferred.jp>

xadupre added 4 commits October 18, 2023 13:19

first draft for large models

c1eeb5a

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

refactoring

ed30644

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

doc

7e6b33a

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Proposal to store large onnx model into a single file

de95233

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Oct 18, 2023

View reviewed changes

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

xadupre added 10 commits October 19, 2023 14:20

Merge branch 'main' of https://github.com/onnx/onnx into lonnx

88ddc36

better comments

a1c897b

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

implement current large format with LargeModel

20ccf8e

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Merge branch 'main' of https://github.com/onnx/onnx into lonnx

e9adec7

fix annotation

a826c56

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

lint

19da8b0

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix annotations

d92ac13

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

lint

1868038

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix annotation

9988993

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

fix lint

555477d

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Oct 20, 2023

View reviewed changes

onnx/large_helper.py Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Oct 20, 2023

View reviewed changes

onnx/large_helper.py Fixed Show fixed Hide fixed

xadupre added 2 commits October 25, 2023 15:46

Merge branch 'main' of https://github.com/onnx/onnx into lonnx

3ee7f04

remove single file format and C++ api for large model

4b9a0d1

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

xadupre changed the title ~~[WIP] Proposal to store large models into a single file~~ Create in-memory large models without serializing large initializers through protobuf Oct 25, 2023

xadupre added 3 commits October 25, 2023 17:24

remove dlpack

9cc116f

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

update documentation

eb249dd

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

restore a file

87299b4

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Oct 25, 2023

View reviewed changes

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

simplify the code

8da1195

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

justinchuby reviewed Oct 25, 2023

View reviewed changes

documentation

18db59d

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Oct 25, 2023

View reviewed changes

onnx/large_helper.py Fixed Show fixed Hide fixed

onnx/large_helper.py Fixed Show fixed Hide fixed

rename large_helper into model_container

a34bd47

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

gramalingam reviewed Oct 27, 2023

View reviewed changes

docs/docsgen/source/api/large_helper.md Outdated Show resolved Hide resolved

gramalingam reviewed Oct 27, 2023

View reviewed changes

onnx/model_container.py Outdated Show resolved Hide resolved

gramalingam reviewed Oct 27, 2023

View reviewed changes

onnx/model_container.py Show resolved Hide resolved

gramalingam reviewed Oct 31, 2023

View reviewed changes

onnx/model_container.py Outdated Show resolved Hide resolved

gramalingam reviewed Oct 31, 2023

View reviewed changes

onnx/model_container.py Outdated Show resolved Hide resolved

xadupre added 2 commits November 2, 2023 10:28

fix merge conflict

37f40eb

endian

55ff096

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Nov 2, 2023

View reviewed changes

onnx/model_container.py Fixed Show fixed Hide fixed

xadupre added 2 commits November 2, 2023 11:01

move byteswap

03e2cc8

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

lint

1de290a

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

github-advanced-security bot found potential problems Nov 2, 2023

View reviewed changes

onnx/model_container.py Fixed Show fixed Hide fixed

justinchuby and others added 5 commits November 4, 2023 10:22

Merge branch 'main' into lonnx

a735f2f

update documentation, improve uniques names

7d6c7cb

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Merge branch 'lonnx' of https://github.com/xadupre/onnx into lonnx

154770b

rename a variable

ae34233

Signed-off-by: Xavier Dupre <xadupre@microsoft.com>

Merge branch 'main' into lonnx

319e9c0

gramalingam approved these changes Nov 8, 2023

View reviewed changes

Merge branch 'main' into lonnx

0bfb4e4

xadupre enabled auto-merge November 9, 2023 11:30

xadupre added this pull request to the merge queue Nov 9, 2023

Merged via the queue into onnx:main with commit 6fa70c5 Nov 9, 2023

xadupre deleted the lonnx branch November 9, 2023 17:05

xadupre mentioned this pull request Nov 13, 2023

Make ReferenceEvaluator support ModelContainer #5754

Merged

thiagocrepaldi reviewed Jan 11, 2024

View reviewed changes

Create in-memory large models without serializing large initializers through protobuf #5685

Create in-memory large models without serializing large initializers through protobuf #5685

Uh oh!

Conversation

xadupre commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

codecov bot commented Oct 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-advanced-security bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gramalingam commented Oct 27, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thiagocrepaldi left a comment

Choose a reason for hiding this comment

Uh oh!

thiagocrepaldi Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xadupre commented Oct 18, 2023 •

edited

Loading

codecov bot commented Oct 18, 2023 •

edited

Loading