-
-
Notifications
You must be signed in to change notification settings - Fork 995
Description
I and @macwiatrak are trying to figure out how to train a Pyro / scvi-tools model on multiple GPUs using PyTorch lightning.
I tried PyTorch Lightning Trainer(strategy="horovod", accelerator="GPU", devices=2)
with Pyro HorovodOptimizer
- however, I am getting ValueError: Tensor is required to be contiguous
. which doesn't really suggest what to do next.
Also, https://github.com/pyro-ppl/pyro/blob/dev/examples/svi_horovod.py fails for me on the LSF cluster because it fails to find certain environmental variables.
Would be great to get some help figuring out what's needed to "natively" train pyro models on multiple GPU using PyTorch Lightning horovod or any other strategy.
We can use https://github.com/BayraktarLab/cell2location as a public test case that should have most of the properties relevant to our current and future projects.
Here is what @adamgayoso thinks about scvi-tools + PyTorch lightning context: scverse/scvi-tools#1226 (comment)