Skip to content

Memory leak using TraceEnum_ELBO #3068

@gioelelm

Description

@gioelelm

I noticed a major memory leak when training SVI using TraceEnum_ELBO.
I initially noticed this in a custom model we are developing but then I found it seems a more general bug.

For example, it affects even the Pyro tutorials GMM example here. Where memory usage rapidly goes from a couple of hundred MBs to a many GBs very quickly!

I have run this Macbook Pro 2019 running MacOS 10.15. To replicate the issue is enough running the notebook linked.

I have tried to comment out the following lines and add a garbage collector call, that reduces the entity of the memory accumulation of one order of magnitude but does not solve the problem completely, which becomes particularly severe for large datasets.

# Register hooks to monitor gradient norms.
# gradient_norms = defaultdict(list)
# for name, value in pyro.get_param_store().named_parameters():
#     value.register_hook(lambda g, name=name: gradient_norms[name].append(g.norm().item()))

import gc
losses = []
for i in range(200000):
    loss = svi.step(data)
    #losses.append(loss)
    gc.collect()

(from this forum post)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions