Skip to content

Fix pooling pads issues #6650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 24, 2025

Conversation

titaiwangms
Copy link
Contributor

@titaiwangms titaiwangms commented Jan 22, 2025

Description

Pooling in reference currently has two bugs:

(1) it has "pads required" to make sure sliding window does not go out-of-bound, but it does not exclude pads required from pooling caculations. This causes microsoft/onnxruntime#16203

(2) In ceil_mode, the reference pooling pads the input image, but does not check if sliding window starts on pads. When we reach the end of the image, the window should stop working. This causes pytorch/pytorch#131272. Not directly, but related fix on MaxPooling: #5741. Detail: pytorch/pytorch#116420 (comment)

Motivation and Context

This PR fixes the two bugs and update their tests accordingly. I also drafted a PR to onnxruntime to have this update in CPU provider: microsoft/onnxruntime#16752

@titaiwangms titaiwangms force-pushed the titaiwang/fix_avgpool_output_shape branch from 2b69b98 to 71d6065 Compare January 22, 2025 23:56
@titaiwangms titaiwangms marked this pull request as ready for review January 23, 2025 17:17
@titaiwangms titaiwangms requested a review from a team as a code owner January 23, 2025 17:17
@titaiwangms titaiwangms force-pushed the titaiwang/fix_avgpool_output_shape branch from ed32535 to 280cd3c Compare January 23, 2025 17:19
@titaiwangms titaiwangms requested a review from a team as a code owner January 23, 2025 17:19
@justinchuby
Copy link
Member

Is there any wording to need to update in the spec? Just checking

@titaiwangms
Copy link
Contributor Author

Is there any wording to need to update in the spec? Just checking

There was a spec ask: #1922

Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 14.70588% with 29 lines in your changes missing coverage. Please review.

Project coverage is 57.49%. Comparing base (9683661) to head (4a3a166).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
onnx/backend/test/case/node/averagepool.py 0.00% 13 Missing ⚠️
onnx/reference/ops/op_pool_common.py 41.66% 5 Missing and 2 partials ⚠️
onnx/backend/test/case/node/lppool.py 0.00% 4 Missing ⚠️
onnx/backend/test/case/node/maxpool.py 0.00% 4 Missing ⚠️
onnx/test/test_backend_onnxruntime.py 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6650      +/-   ##
==========================================
- Coverage   57.50%   57.49%   -0.01%     
==========================================
  Files         507      507              
  Lines       31624    31632       +8     
  Branches     3046     3048       +2     
==========================================
+ Hits        18185    18187       +2     
- Misses      12613    12618       +5     
- Partials      826      827       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@titaiwangms
Copy link
Contributor Author

titaiwangms commented Jan 23, 2025

@liqunfu @gramalingam @xadupre @justinchuby PTAL

I am not sure if I will have to update the spec or something?

@titaiwangms titaiwangms force-pushed the titaiwang/fix_avgpool_output_shape branch from e47ddaa to 231f353 Compare January 23, 2025 23:03
@titaiwangms
Copy link
Contributor Author

My newly added test is failing with onnx/test/test_backend_onnxruntime.py, which is expected because ORT also needs an update: microsoft/onnxruntime#16752. So should I disable the test for the time being?

@justinchuby
Copy link
Member

Yes. Please feel free to disable

titaiwangms and others added 12 commits January 24, 2025 01:33
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Deprecate all type casting functions because they are mere utilities or
reference implementations that should not be directly exposed to users.
Users should leverage libraries like `ml_dtypes` for type conversion.

- Additionally turned convert_endian and split_complex_to_pairs into
private. This is seen as low risk.
- Added typing_extensions as a dependency. This is a common dependency
used by python packages as a polyfill for the typing module.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description
Clarify in spec proto that FLOAT4E2M1 can be in int32_data, according to
test usage.
Updated text in the spec for int32_data for better readability and
accuracy.

### Motivation and Context

Previously the spec was incomplete according to the added tests.
FLOAT4E2M1 was not yet released so the change should not require a new
IR version.

---------

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
### Description
Unify names of github environments

I already updated the github environments.

Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: titaiwangms <titaiwang@microsoft.com>
@titaiwangms titaiwangms force-pushed the titaiwang/fix_avgpool_output_shape branch from f9765fc to 1ba218d Compare January 24, 2025 01:33
titaiwangms added a commit to microsoft/onnxruntime that referenced this pull request Jan 24, 2025
Fix #16203

Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.

However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.

Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: onnx/onnx#6650 (And
this PR is also validated with the tests added in
onnx/onnx#6650)
Copy link
Member

@justinchuby justinchuby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need another eye on the logic

@xadupre xadupre added this pull request to the merge queue Jan 24, 2025
Merged via the queue into onnx:main with commit dcd14ba Jan 24, 2025
34 checks passed
andife added a commit that referenced this pull request Jan 25, 2025
### Description
Pooling in reference currently has two bugs:

(1) it has ["pads
required"](https://github.com/onnx/onnx/blob/e292b4ae6d016c3231a801bfeb26f802ba95d82a/onnx/reference/ops/op_pool_common.py#L53)
to make sure sliding window does not go out-of-bound, but it does not
exclude pads required from pooling caculations. This causes
microsoft/onnxruntime#16203

(2) In `ceil_mode`, the reference pooling pads the input image, but does
not check if sliding window starts on pads. When we reach the end of the
image, the window should stop working. This causes
pytorch/pytorch#131272. Not directly, but
related fix on MaxPooling: #5741.
Detail:
pytorch/pytorch#116420 (comment)

### Motivation and Context
This PR fixes the two bugs and update their tests accordingly. I also
drafted a PR to onnxruntime to have this update in CPU provider:
microsoft/onnxruntime#16752

---------

Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Andreas Fehlner <fehlner@arcor.de>
Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
ashrit-ms pushed a commit to microsoft/onnxruntime that referenced this pull request Feb 11, 2025
Fix #16203

Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.

However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.

Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: onnx/onnx#6650 (And
this PR is also validated with the tests added in
onnx/onnx#6650)
seungwoo-ji-03 pushed a commit to seungwoo-ji-03/onnx that referenced this pull request Feb 17, 2025
### Description
Pooling in reference currently has two bugs:

(1) it has ["pads
required"](https://github.com/onnx/onnx/blob/e292b4ae6d016c3231a801bfeb26f802ba95d82a/onnx/reference/ops/op_pool_common.py#L53)
to make sure sliding window does not go out-of-bound, but it does not
exclude pads required from pooling caculations. This causes
microsoft/onnxruntime#16203

(2) In `ceil_mode`, the reference pooling pads the input image, but does
not check if sliding window starts on pads. When we reach the end of the
image, the window should stop working. This causes
pytorch/pytorch#131272. Not directly, but
related fix on MaxPooling: onnx#5741.
Detail:
pytorch/pytorch#116420 (comment)

### Motivation and Context
This PR fixes the two bugs and update their tests accordingly. I also
drafted a PR to onnxruntime to have this update in CPU provider:
microsoft/onnxruntime#16752

---------

Signed-off-by: titaiwangms <titaiwang@microsoft.com>
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
Signed-off-by: Andreas Fehlner <fehlner@arcor.de>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Andreas Fehlner <fehlner@arcor.de>
Signed-off-by: seungwoo-ji <seungwoo.ji@nuvilab.com>
guschmue pushed a commit to microsoft/onnxruntime that referenced this pull request Mar 6, 2025
Fix #16203

Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.

However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.

Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: onnx/onnx#6650 (And
this PR is also validated with the tests added in
onnx/onnx#6650)
ashrit-ms pushed a commit to microsoft/onnxruntime that referenced this pull request Mar 17, 2025
Fix #16203

Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.

However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.

Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: onnx/onnx#6650 (And
this PR is also validated with the tests added in
onnx/onnx#6650)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants