-
Notifications
You must be signed in to change notification settings - Fork 25.1k
Closed
Description
Issue description
I found someone had reported this error and could not reproduce it. However I always get this error during my training. When I get this error, the code is still running, so I continue to get this problem. It seems that it has no effect on training.
Code example
train_dataset = lmdbDataset(root=opt.trainroot, transform=resizeNormalize(size=(592, 32)))
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=opt.batchSize,
shuffle=True, sampler=None,
num_workers=int(opt.workers),
collate_fn=alignCollate())
train_iter = iter(train_loader)
cpu_images, cpu_texts, cpu_lengths = next(train_iter)
ERROR message:
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fec6928a9b0>>
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
self._shutdown_workers()
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 337, in get
turn _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
turn reduction.recv_handle(conn)
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
turn recvfds(s, 1)[0]
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError:
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
self._shutdown_workers()
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError:
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fec6928a9b0>>
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
self._shutdown_workers()
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
return reduction.recv_handle(conn)
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
return recvfds(s, 1)[0]
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError:
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <bound method _DataLoaderIter.__del__ of <torch.utils.data.dataloader._DataLoaderIter object at 0x7fec6928a9b0>>
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in __del__
self._shutdown_workers()
File "/root/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/root/anaconda3/lib/python3.6/multiprocessing/queues.py", line 337, in get
turn _ForkingPickler.loads(res)
File "/root/anaconda3/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/root/anaconda3/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach
turn reduction.recv_handle(conn)
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 182, in recv_handle
turn recvfds(s, 1)[0]
File "/root/anaconda3/lib/python3.6/multiprocessing/reduction.py", line 155, in recvfds
raise EOFError
EOFError:
System Info
- PyTorch or Caffe2: Pytorch
- How you installed PyTorch (conda, pip, source): pip
- Build command you used (if compiling from source):
- OS: centos 7
- PyTorch version: 0.4.0
- Python version: 3.6
- CUDA/cuDNN version: cuda9.0/cudnn7.0
ahangchen, JayJJChen, nathanwang000, dihuangcode, mattphillipsphd and 21 morehenzler, mattphillipskitware, miku, phizaz, ptkin and 2 more
Metadata
Metadata
Assignees
Labels
No labels