Skip to content

Segfault when double backward on BatchNorm2d #2264

@hongyi-zhang

Description

@hongyi-zhang

I got segmentation fault when trying to twice differentiate BatchNorm2d. A simple example to reproduce the error is the network:

BatchNorm2d --> Linear --> exp --> sum

Removing either BatchNorm2d or exp fixes the problem.

I am on the master branch, using Python 2.7, cuda 8.0, cudnn 6.0. The error can be reproduced with the following code:

import torch
import torch.nn as nn
import torch.nn.functional as F

from torch.autograd import Variable


class BatchNormTest(nn.Module):
    def __init__(self, num_classes=2):
        super(BatchNormTest, self).__init__()
        self.bn = nn.BatchNorm2d(3)
        self.linear = nn.Linear(3*4*4, num_classes)

    def forward(self, x):
        out = x
        # the following line leads to SEGFAULT
        # no SEGFAULT when commented out
        out = self.bn(out)
        out = out.view(out.size(0), -1)
        out = self.linear(out)
        return out

b = 4
net = BatchNormTest()
use_cuda = True
inputs = Variable(torch.rand(b,3,4,4), requires_grad=True)
if use_cuda:
    net.cuda()
    inputs = inputs.cuda()

output = net(inputs)
# this line leads to SEGFAULT
loss1 = torch.sum(torch.exp(output))
## whereas this line does not
# loss1 = torch.sum(output)
grad_params = torch.autograd.grad(loss1, inputs, create_graph=True)

grad = grad_params[0]
loss = torch.sum(grad)

loss.backward()

gdb information:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb698d700 (LWP 701)]
torch::autograd::BatchNormBackward::apply (this=0x4edfc718, grad_outputs=...) at torch/csrc/autograd/functions/batch_normalization.cpp:177
warning: Source file is more recent than executable.
177             grad_weight,
(gdb) where
#0  torch::autograd::BatchNormBackward::apply (this=0x4edfc718, grad_outputs=...) at torch/csrc/autograd/functions/batch_normalization.cpp:177
#1  0x00007fffecf0a392 in call_function (task=...) at torch/csrc/autograd/engine.cpp:162
#2  torch::autograd::Engine::evaluate_function (this=this@entry=0x7fffedf93b00 <engine>, task=...) at torch/csrc/autograd/engine.cpp:167
#3  0x00007fffecf0bf39 in torch::autograd::Engine::thread_main (this=this@entry=0x7fffedf93b00 <engine>, queue=..., device=device@entry=0) at torch/csrc/autograd/engine.cpp:117
#4  0x00007fffecf27d1a in PythonEngine::thread_main (this=0x7fffedf93b00 <engine>, queue=..., device=0) at torch/csrc/autograd/python_engine.cpp:23
#5  0x00007fffecf106ee in operator()<std::shared_ptr<torch::autograd::ReadyQueue>, int, void> (__object=<optimized out>, this=<optimized out>)
    at /private/home/hongyizmit/.conda/envs/torchmaster/gcc/include/c++/functional:601
#6  _M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /private/home/hongyizmit/.conda/envs/torchmaster/gcc/include/c++/functional:1732
#7  operator() (this=<optimized out>) at /private/home/hongyizmit/.conda/envs/torchmaster/gcc/include/c++/functional:1720
#8  std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (torch::autograd::Engine::*)(std::shared_ptr<torch::autograd::ReadyQueue>, int)> (torch::autograd::Engine*, std::shared_ptr<torch::autograd::ReadyQueue>, int)
> >::_M_run() (this=<optimized out>) at /private/home/hongyizmit/.conda/envs/torchmaster/gcc/include/c++/thread:115
#9  0x00007fffcf81d260 in ?? () from /private/home/hongyizmit/.conda/envs/torchmaster/lib/libstdc++.so.6
#10 0x00007ffff77c8184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions