Skip to content

Clarify that FLOAT4E2M1 can be in int32_data #6640

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 22, 2025
Merged

Conversation

justinchuby
Copy link
Member

Description

Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage.
Updated text in the spec for int32_data for better readability and accuracy.

Motivation and Context

Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version.

@justinchuby justinchuby requested a review from a team as a code owner January 13, 2025 20:16
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@justinchuby
Copy link
Member Author

@yuanyao-nv Please review and verify that the changes are desired and accurate. Thanks!

@justinchuby justinchuby added this to the 1.18 milestone Jan 13, 2025
@justinchuby justinchuby added the topic: spec clarification Clarification of the ONNX spec needed label Jan 13, 2025
@justinchuby justinchuby temporarily deployed to testpypi_onnxweekly January 13, 2025 20:18 — with GitHub Actions Inactive
Copy link

codecov bot commented Jan 13, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.46%. Comparing base (5ab1e42) to head (00668ee).
Report is 6 commits behind head on main.

✅ All tests successful. No failed tests found.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6640      +/-   ##
==========================================
+ Coverage   57.44%   57.46%   +0.02%     
==========================================
  Files         507      507              
  Lines       31612    31584      -28     
  Branches     3058     3046      -12     
==========================================
- Hits        18158    18151       -7     
+ Misses      12610    12607       -3     
+ Partials      844      826      -18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby justinchuby added the auto update doc Generate md/proto files automatically using the CI pipeline label Jan 13, 2025
@justinchuby justinchuby temporarily deployed to testpypi_onnxweekly January 13, 2025 20:25 — with GitHub Actions Inactive
@justinchuby justinchuby reopened this Jan 13, 2025
@justinchuby justinchuby removed the auto update doc Generate md/proto files automatically using the CI pipeline label Jan 13, 2025
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@justinchuby justinchuby requested a review from xadupre January 14, 2025 01:04
@gramalingam
Copy link
Contributor

Thanks for the clarification.

The int4 packing description clarifies packing 2 int4 into 1 byte, but it doesn't clarify whether such bytes are then packed into int32 (4 each except for the last one) or each byte encoded as a single int32 or whether both are allowed?

I guess the same question applies to float8 as well, but the previous description seems to suggest that a single float8 is represented as a single int32 ...

@justinchuby
Copy link
Member Author

justinchuby commented Jan 14, 2025

I understand it as each byte encoded as a single int32. Please feel free to suggest wording. @yuanyao-nv do you have comments or further clarifications?

@yuanyao-nv
Copy link
Contributor

Thanks for the clarification.

The int4 packing description clarifies packing 2 int4 into 1 byte, but it doesn't clarify whether such bytes are then packed into int32 (4 each except for the last one) or each byte encoded as a single int32 or whether both are allowed?

I guess the same question applies to float8 as well, but the previous description seems to suggest that a single float8 is represented as a single int32 ...

I agree it's not clear from the proto docstring whether bytes are then packed into int32. But looking at the helper function implementations (eg numpy_helper.py::_to_array()) each int32_data only holds one byte's worth of data.

@gramalingam
Copy link
Contributor

gramalingam commented Jan 15, 2025

Thanks for the clarification.
The int4 packing description clarifies packing 2 int4 into 1 byte, but it doesn't clarify whether such bytes are then packed into int32 (4 each except for the last one) or each byte encoded as a single int32 or whether both are allowed?
I guess the same question applies to float8 as well, but the previous description seems to suggest that a single float8 is represented as a single int32 ...

I agree it's not clear from the proto docstring whether bytes are then packed into int32. But looking at the helper function implementations (eg numpy_helper.py::_to_array()) each int32_data only holds one byte's worth of data.

Wondering about its efficiency. Looking at this description of protobuf encoding, I assume that this ends up getting represented as a LEN encoding of varints, is that right? This means that a byte-value <= 127 is encoded using 1 byte, but a value >= 128 will use 2 bytes. Assuming both are equally likely, we could end up using 1.5 bytes for each byte (2 INT4s) on average. But if we pack 4 bytes (each with two INT4s) into an INT32, then that int32 value will likely need 5 bytes (most of the time) or 4 bytes in the varint form.

Ok, so there is a difference (may be 5 bytes vs 6 bytes to represent 8 INT4s). Not huge, but 15% difference could be significant for GB-size tensors. So, maybe we should allow both?

(What would likely be even better is packing 7 INT4s into an int32, which can always be represented using 4 bytes ... but that would impact the packing/unpacking logic more significantly than the above.)

But, looking into it more, I guess raw_data would be the best anyway for representing the packed INT4? If so, maybe we don't have to worry about this, but remind users that raw_data preferable for this case

@justinchuby
Copy link
Member Author

From the protobuf docs, https://protobuf.dev/programming-guides/encoding/#packed yes it seems reasonable to say that raw_data is most reasonable, because that gives us 4bytes/8 int4s right? So that should be the most efficient?

@yuanyao-nv
Copy link
Contributor

I agree with Rama's assessment as well, that raw should be more efficient since no extra byte is needed.
And it seems like the only way ONNX exposes control over the use of int32_data vs raw_data is by setting the raw=True flag in make_tensor()?

@justinchuby
Copy link
Member Author

justinchuby commented Jan 15, 2025

I agree with Rama's assessment as well, that raw should be more efficient since no extra byte is needed. And it seems like the only way ONNX exposes control over the use of int32_data vs raw_data is by setting the raw=True flag in make_tensor()?

That's reasonable. There's only one toggle in the helper apis. I think simplicity helps here.

@justinchuby
Copy link
Member Author

Please feel free to suggest wording. I tried to rephrase but wasn't happy with what I got. I was trying to convey that the all values should be reinterpreted as int32 without further packing.

@justinchuby
Copy link
Member Author

I used chat gpt's help in rephrasing it

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
@justinchuby
Copy link
Member Author

@yuanyao-nv take a final look? Thanks!

@yuanyao-nv
Copy link
Contributor

LGTM. Thanks!

@justinchuby justinchuby enabled auto-merge January 22, 2025 21:23
@justinchuby justinchuby removed the request for review from xadupre January 22, 2025 21:23
@justinchuby justinchuby added this pull request to the merge queue Jan 22, 2025
@justinchuby justinchuby added the release notes Important changes to call out in release notes label Jan 22, 2025
@justinchuby justinchuby temporarily deployed to testpypi_onnxweekly January 22, 2025 21:28 — with GitHub Actions Inactive
Merged via the queue into main with commit e292b4a Jan 22, 2025
43 checks passed
@justinchuby justinchuby deleted the justinchu/int32-data branch January 22, 2025 21:44
titaiwangms pushed a commit to titaiwangms/onnx that referenced this pull request Jan 22, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
titaiwangms pushed a commit to titaiwangms/onnx that referenced this pull request Jan 23, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
titaiwangms pushed a commit to titaiwangms/onnx that referenced this pull request Jan 23, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
titaiwangms pushed a commit to titaiwangms/onnx that referenced this pull request Jan 23, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
titaiwangms pushed a commit to titaiwangms/onnx that referenced this pull request Jan 24, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
andife pushed a commit that referenced this pull request Jan 25, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
seungwoo-ji-03 pushed a commit to seungwoo-ji-03/onnx that referenced this pull request Feb 17, 2025
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes Important changes to call out in release notes topic: spec clarification Clarification of the ONNX spec needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants