-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Improve consistency of tensors produced by make_tensor #4510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: xadupre <xadupre@microsoft.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So after this PR, there is no way to make INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16
tensors into int32_data? Will it potentially break something? For instance, certain backend does not support loading ONNX tensors from raw data. To prevent this, perhaps we can still preserve a way to do that but warn users it is not recommended.
I don't have the history context about why storing small data type into large data type (int32_data), which causes memory waste. If this PR is necessary, we need to update the proto file as well:
Line 545 in d15ba96
// INT32, INT16, INT8, UINT16, UINT8, BOOL, or FLOAT16 |
This PR is not necessary but i think it is better this way and more consistent. I'm curious to know which runtime cannig read the raw data. It should be able to cast the pointer raw_data into the proper type to read it. If a runtime cannot read int16 format, then the converting library should modify the type of an initializer. I don't think we should keep this kind of weird behavior just to accomodate a runtime. |
I suggest something in-between: we generalize the helper function Specifically, I think we need to support backward-compatibility for pre-existing models, so we should not make proto changes that break backward-compatibility. As long as there is an efficient encoding, and we help users create the efficient encoding, that should serve the main purpose. If someone wants to explicitly create the less efficient encoding, we should not prevent it. |
Note that protobuf uses variable-length encoding for integers. So, a boolean value encoded as int32 is encoded using a single byte in the proto format. In fact, I believe values 1 to 127 are encoded using 1 byte (one bit is used as the continuation-bit). Similarly, values encodable using 14 bits take 2 bytes. Otherwise, it would be 3 bytes. Or, at least, that's my understanding based on a quick look at this description |
In contrast, the raw format uses a fixed-length encoding. |
Signed-off-by: xadupre <xadupre@microsoft.com>
This pull request introduces 1 alert when merging c78c905 into 895593a - view on LGTM.com new alerts:
|
I updated the code to keep the former make_tensor. I read the documentation in onnx.proto and onnx implements what it says it should do. Maybe this PR is not needed. However, it still looks counterintuitive to me. We should probably do a benchmark to compare the loading / writing time when users use |
Signed-off-by: xadupre xadupre@microsoft.com
Description
An array of uint8 make store its content in attribute
int32_data
. It is more space than necessary. Storage field should not exist. With this PR, any type different from string, float, double, int32, int64 is always stored as raw tensors. There is no mismatch between the element type and the attribute used to hold the data and the tensor does not use more memory than it should.Motivation and Context
Better code consistency.