-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Clarify that FLOAT4E2M1 can be in int32_data #6640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
2275006
to
bf8266c
Compare
@yuanyao-nv Please review and verify that the changes are desired and accurate. Thanks! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #6640 +/- ##
==========================================
+ Coverage 57.44% 57.46% +0.02%
==========================================
Files 507 507
Lines 31612 31584 -28
Branches 3058 3046 -12
==========================================
- Hits 18158 18151 -7
+ Misses 12610 12607 -3
+ Partials 844 826 -18 ☔ View full report in Codecov by Sentry. |
Thanks for the clarification. The int4 packing description clarifies packing 2 int4 into 1 byte, but it doesn't clarify whether such bytes are then packed into int32 (4 each except for the last one) or each byte encoded as a single int32 or whether both are allowed? I guess the same question applies to float8 as well, but the previous description seems to suggest that a single float8 is represented as a single int32 ... |
I understand it as each byte encoded as a single int32. Please feel free to suggest wording. @yuanyao-nv do you have comments or further clarifications? |
I agree it's not clear from the proto docstring whether bytes are then packed into int32. But looking at the helper function implementations (eg |
Wondering about its efficiency. Looking at this description of protobuf encoding, I assume that this ends up getting represented as a Ok, so there is a difference (may be 5 bytes vs 6 bytes to represent 8 INT4s). Not huge, but 15% difference could be significant for GB-size tensors. So, maybe we should allow both? (What would likely be even better is packing 7 INT4s into an int32, which can always be represented using 4 bytes ... but that would impact the packing/unpacking logic more significantly than the above.) But, looking into it more, I guess raw_data would be the best anyway for representing the packed INT4? If so, maybe we don't have to worry about this, but remind users that raw_data preferable for this case |
From the protobuf docs, https://protobuf.dev/programming-guides/encoding/#packed yes it seems reasonable to say that raw_data is most reasonable, because that gives us 4bytes/8 int4s right? So that should be the most efficient? |
I agree with Rama's assessment as well, that raw should be more efficient since no extra byte is needed. |
That's reasonable. There's only one toggle in the helper apis. I think simplicity helps here. |
Please feel free to suggest wording. I tried to rephrase but wasn't happy with what I got. I was trying to convey that the all values should be reinterpreted as int32 without further packing. |
I used chat gpt's help in rephrasing it |
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
587231e
to
00668ee
Compare
@yuanyao-nv take a final look? Thanks! |
LGTM. Thanks! |
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
### Description Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage. Updated text in the spec for int32_data for better readability and accuracy. ### Motivation and Context Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version. --------- Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com> Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>
Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to test usage.
Updated text in the spec for int32_data for better readability and accuracy.
Motivation and Context
Previously the spec was incomplete according to the added tests. FLOAT4E2M1 was not yet released so the change should not require a new IR version.