[BE][SparseAdam] cleaner way to verify no sparse params #114425

janeyx99 · 2023-11-22T23:58:55Z

Deprecation

As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups. Hitherto, the SparseAdam constructor had accidentally allow raw tensors as the params argument to the constructor. Now, if you write the following code, there will be a warning: "Passing in a raw Tensor as params to SparseAdam is deprecated. In the future, this will raise an error. Please wrap your Tensor in an iterable instead."

import torch
param = torch.rand(16, 32)
optimizer = torch.optim.SparseAdam(param)

Instead you should replace the last line with

optimizer = torch.optim.SparseAdam([param])

to avoid the warning.

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the list(...) construct. However, this meant that SparseAdam deviated from other optimizers in that it could accept a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling super().__init__ first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

pytorch-bot · 2023-11-22T23:58:59Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114425

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 3e061a6 with merge base 4a4c9fb ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg · 2023-11-27T16:25:36Z

Channeling my inner @albanD here but how sure that this BC break isn't a big deal?

janeyx99 · 2023-11-27T16:54:18Z

Channeling my inner @albanD here but how sure that this BC break isn't a big deal?

haha, maybe 95% sure. two main observations (but my position here is movable):

A survey of the first 5 pages of https://github.com/search?q=SparseAdam%28&type=code&p=2 yields that people who use SparseAdam actually tended the other way of list-ifying their params (due to the bug mentioned in Fix generator exhaustion in SparseAdam #47724). Empirically I did not spot a single place where a singular tensor is passed in.
Generally, in ML, the BC breaking case would be when someone optimizes 1 tensor, which I expect to be rare.

I suppose a decent test could be importing this PR internally and seeing if anything breaks?

torch/optim/sparse_adam.py

drisspg

Sounds good, BC concerns on you lol

Context: #47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe. [ghstack-poisoned]

janeyx99 · 2023-11-28T21:20:44Z

torch/optim/optimizer.py

-            raise TypeError("params argument given to the optimizer should be "
-                            "an iterable of Tensors or dicts, but got " +
-                            torch.typename(params))
+            if self.__class__.__name__ == 'SparseAdam':


happy to use a different method than looking into the dunders here, just didn't know of another way

janeyx99 · 2023-11-28T21:27:42Z

Imported diff was all green, but due to the very very slight chance that someone may have used SparseAdam incorrectly between the last landed PR and now, I decided to be safe and add a deprecation warning.

mikaylagawarecki · 2023-11-28T21:33:49Z

torch/optim/optimizer.py

+            if self.__class__.__name__ == 'SparseAdam':
+                warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "
+                               "this will raise an error. Please wrap your Tensor in "
+                               "an iterable instead."), UserWarning)


nit:

Suggested change

"an iterable instead."), UserWarning)

"an iterable instead."), FutureWarning)

What's the difference here?

FutureWarning is a warning message that warns you about deprecated features that will be removed in a future version of a package.

I don't think we are that consistent about using FutureWarning for deprecations, but I suppose it makes sense to do this in case users want to explicitly suppress/allow certain warnings

mikaylagawarecki · 2023-11-28T21:36:32Z

torch/optim/optimizer.py

-                            "an iterable of Tensors or dicts, but got " +
-                            torch.typename(params))
+            if self.__class__.__name__ == 'SparseAdam':
+                warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "


nit:

Suggested change

warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "

warnings.warn(("Passing in a raw Tensor as ``params`` to SparseAdam is deprecated. In the future, "

Context: #47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal. So why this PR? I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024. (But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛) [ghstack-poisoned]

janeyx99 · 2023-11-29T16:25:44Z

@pytorchbot merge

pytorchmergebot · 2023-11-29T16:27:40Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This continues the full deprecation after #114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: #127081 Approved by: https://github.com/soulitzer

This continues the full deprecation after pytorch#114425. It's been 6 months! And I'm fairly certain no one is going to yell at me as this patch is not really used. ------ # BC Breaking note As of this PR, SparseAdam will become consistent with the rest of our optimizers in that it will only accept containers of Tensors/Parameters/param groups and fully complete deprecation of this path. Hitherto, the SparseAdam constructor had allowed raw tensors as the params argument to the constructor. Now, if you write the following code, there will be an error similar to every other optim: "params argument given to the optimizer should be an iterable of Tensors or dicts" ``` import torch param = torch.rand(16, 32) optimizer = torch.optim.SparseAdam(param) ``` Instead you should replace the last line with ``` optimizer = torch.optim.SparseAdam([param]) ``` to no longer error. Pull Request resolved: pytorch#127081 Approved by: https://github.com/soulitzer

[BE][SparseAdam] cleaner way to verify no sparse params

01218f7

[ghstack-poisoned]

janeyx99 requested a review from albanD as a code owner November 22, 2023 23:58

pytorch-bot bot added the release notes: optim label Nov 22, 2023

janeyx99 mentioned this pull request Nov 22, 2023

Introduce OptimizerInfos + add a test_errors #114178

Closed

drisspg reviewed Nov 27, 2023

View reviewed changes

torch/optim/sparse_adam.py Show resolved Hide resolved

drisspg approved these changes Nov 27, 2023

View reviewed changes

janeyx99 mentioned this pull request Nov 27, 2023

[TEST ONLY][BE][SparseAdam] cleaner way to verify no sparse params #114616

Closed

janeyx99 added 2 commits November 28, 2023 09:14

janeyx99 commented Nov 28, 2023

View reviewed changes

mikaylagawarecki reviewed Nov 28, 2023

View reviewed changes

mikaylagawarecki added the topic: bc breaking topic category label Nov 28, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 29, 2023

pytorchmergebot added the merging label Nov 29, 2023

pytorchmergebot added Merged and removed merging labels Nov 29, 2023

pytorchmergebot closed this in 7c1a501 Nov 29, 2023

This was referenced Nov 29, 2023

Move test_multi_tensor_optimizers to use OptimizerInfos #114797

Closed

[BE] remove redundant _test_derived_optimizers by migrating more to OptimizerInfo #114802

Closed

Migrate test_peak_mem_multi_tensor_optimizers to OptimizerInfo #115023

Closed

facebook-github-bot deleted the gh/janeyx99/109/head branch December 3, 2023 15:27

janeyx99 removed the topic: bc breaking topic category label Jan 19, 2024

janeyx99 added the topic: deprecation topic category label Jan 19, 2024

janeyx99 mentioned this pull request May 24, 2024

Remove SparseAdam weird allowance of raw Tensor input #127081

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BE][SparseAdam] cleaner way to verify no sparse params #114425

[BE][SparseAdam] cleaner way to verify no sparse params #114425

Uh oh!

janeyx99 commented Nov 22, 2023 •

edited

Loading

Uh oh!

pytorch-bot bot commented Nov 22, 2023 •

edited

Loading

Uh oh!

drisspg commented Nov 27, 2023

Uh oh!

janeyx99 commented Nov 27, 2023 •

edited

Loading

Uh oh!

Uh oh!

drisspg left a comment

Uh oh!

janeyx99 Nov 28, 2023

Uh oh!

janeyx99 commented Nov 28, 2023

Uh oh!

mikaylagawarecki Nov 28, 2023 •

edited

Loading

Uh oh!

janeyx99 Nov 28, 2023

Uh oh!

mikaylagawarecki Nov 28, 2023 •

edited

Loading

Uh oh!

mikaylagawarecki Nov 28, 2023 •

edited

Loading

Uh oh!

janeyx99 commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Uh oh!

Uh oh!

	"an iterable instead."), UserWarning)
	"an iterable instead."), FutureWarning)

	warnings.warn(("Passing in a raw Tensor is deprecated. In the future, "
	warnings.warn(("Passing in a raw Tensor as ``params`` to SparseAdam is deprecated. In the future, "

[BE][SparseAdam] cleaner way to verify no sparse params #114425

[BE][SparseAdam] cleaner way to verify no sparse params #114425

Uh oh!

Conversation

janeyx99 commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deprecation

Context:

Uh oh!

pytorch-bot bot commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/114425

✅ No Failures

Uh oh!

drisspg commented Nov 27, 2023

Uh oh!

janeyx99 commented Nov 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

drisspg left a comment

Choose a reason for hiding this comment

Uh oh!

janeyx99 Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

janeyx99 commented Nov 28, 2023

Uh oh!

mikaylagawarecki Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janeyx99 Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikaylagawarecki Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

janeyx99 commented Nov 29, 2023

Uh oh!

pytorchmergebot commented Nov 29, 2023

Merge started

Uh oh!

Uh oh!

janeyx99 commented Nov 22, 2023 •

edited

Loading

pytorch-bot bot commented Nov 22, 2023 •

edited

Loading

janeyx99 commented Nov 27, 2023 •

edited

Loading

mikaylagawarecki Nov 28, 2023 •

edited

Loading

mikaylagawarecki Nov 28, 2023 •

edited

Loading

mikaylagawarecki Nov 28, 2023 •

edited

Loading