Skip to content

Conversation

xadupre
Copy link
Contributor

@xadupre xadupre commented Sep 13, 2022

Signed-off-by: xadupre xadupre@microsoft.com

Description

An array of uint8 make store its content in attribute int32_data. It is more space than necessary. Storage field should not exist. With this PR, any type different from string, float, double, int32, int64 is always stored as raw tensors. There is no mismatch between the element type and the attribute used to hold the data and the tensor does not use more memory than it should.

Motivation and Context

Better code consistency.

Signed-off-by: xadupre <xadupre@microsoft.com>
@xadupre xadupre requested a review from a team as a code owner September 13, 2022 17:40
Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
@xadupre xadupre changed the title [WIP] Improve consistency of tensors produced by make_tensor Improve consistency of tensors produced by make_tensor Sep 14, 2022
Copy link
Member

@jcwchen jcwchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So after this PR, there is no way to make INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16 tensors into int32_data? Will it potentially break something? For instance, certain backend does not support loading ONNX tensors from raw data. To prevent this, perhaps we can still preserve a way to do that but warn users it is not recommended.

I don't have the history context about why storing small data type into large data type (int32_data), which causes memory waste. If this PR is necessary, we need to update the proto file as well:

// INT32, INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16

@xadupre
Copy link
Contributor Author

xadupre commented Sep 20, 2022

This PR is not necessary but i think it is better this way and more consistent. I'm curious to know which runtime cannig read the raw data. It should be able to cast the pointer raw_data into the proper type to read it. If a runtime cannot read int16 format, then the converting library should modify the type of an initializer. I don't think we should keep this kind of weird behavior just to accomodate a runtime.

@gramalingam
Copy link
Contributor

I suggest something in-between: we generalize the helper function make_tensor to generate the more efficient encoding, but the inefficient encoding should still be supported.

Specifically, I think we need to support backward-compatibility for pre-existing models, so we should not make proto changes that break backward-compatibility.

As long as there is an efficient encoding, and we help users create the efficient encoding, that should serve the main purpose. If someone wants to explicitly create the less efficient encoding, we should not prevent it.

@gramalingam
Copy link
Contributor

Note that protobuf uses variable-length encoding for integers. So, a boolean value encoded as int32 is encoded using a single byte in the proto format. In fact, I believe values 1 to 127 are encoded using 1 byte (one bit is used as the continuation-bit). Similarly, values encodable using 14 bits take 2 bytes. Otherwise, it would be 3 bytes. Or, at least, that's my understanding based on a quick look at this description

@gramalingam
Copy link
Contributor

In contrast, the raw format uses a fixed-length encoding.

@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2022

This pull request introduces 1 alert when merging c78c905 into 895593a - view on LGTM.com

new alerts:

  • 1 for Unused local variable

Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
Signed-off-by: xadupre <xadupre@microsoft.com>
@xadupre
Copy link
Contributor Author

xadupre commented Sep 21, 2022

I updated the code to keep the former make_tensor. I read the documentation in onnx.proto and onnx implements what it says it should do. Maybe this PR is not needed. However, it still looks counterintuitive to me. We should probably do a benchmark to compare the loading / writing time when users use make_tensor(..., raw=True or False). That would help making the right decision.

@xadupre xadupre closed this Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants