-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[BE][SparseAdam] cleaner way to verify no sparse params #114425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114425
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3e061a6 with merge base 4a4c9fb ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Channeling my inner @albanD here but how sure that this BC break isn't a big deal? |
haha, maybe 95% sure. two main observations (but my position here is movable):
I suppose a decent test could be importing this PR internally and seeing if anything breaks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, BC concerns on you lol
Context: #47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe. [ghstack-poisoned]
Context: #47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe. [ghstack-poisoned]
raise TypeError("params argument given to the optimizer should be " | ||
"an iterable of Tensors or dicts, but got " + | ||
torch.typename(params)) | ||
if self.__class__.__name__ == 'SparseAdam': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to use a different method than looking into the dunders here, just didn't know of another way
Imported diff was all green, but due to the very very slight chance that someone may have used SparseAdam incorrectly between the last landed PR and now, I decided to be safe and add a deprecation warning. |
torch/optim/optimizer.py
Outdated
if self.__class__.__name__ == 'SparseAdam': | ||
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, " | ||
"this will raise an error. Please wrap your Tensor in " | ||
"an iterable instead."), UserWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
"an iterable instead."), UserWarning) | |
"an iterable instead."), FutureWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FutureWarning
is a warning message that warns you about deprecated features that will be removed in a future version of a package.
I don't think we are that consistent about using FutureWarning
for deprecations, but I suppose it makes sense to do this in case users want to explicitly suppress/allow certain warnings
torch/optim/optimizer.py
Outdated
"an iterable of Tensors or dicts, but got " + | ||
torch.typename(params)) | ||
if self.__class__.__name__ == 'SparseAdam': | ||
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
warnings.warn(("Passing in a raw Tensor is deprecated. In the future, " | |
warnings.warn(("Passing in a raw Tensor as ``params`` to SparseAdam is deprecated. In the future, " |
Context: #47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024. (But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛) [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This continues the full deprecation after #114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: #127081 Approved by: https://github.com/soulitzer
This continues the full deprecation after pytorch#114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: pytorch#127081 Approved by: https://github.com/soulitzer
Deprecation
As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups. Hitherto, the SparseAdam constructor had accidentally allow raw tensors as the params argument to the constructor. Now, if you write the following code, there will be a warning: "Passing in a raw Tensor as
params
to SparseAdam is deprecated. In the future, this will raise an error. Please wrap your Tensor in an iterable instead."Instead you should replace the last line with
to avoid the warning.
Context:
#47724 fixed the problem that SparseAdam could not handle generators by using the
list(...)
construct. However, this meant that SparseAdam deviated from other optimizers in that it could accept a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.So why this PR?
I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling
super().__init__
first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)
Stack from ghstack (oldest at bottom):