Slim multi-gpu performance problems

I was using slim models with flower dataset in Ubuntu 16.04.

Tensorflow version:1.1.0rc2 from src

git version:
34c738cc6d3badcb22e3f72482536ada29bd0e65

Bazel version:
Build label: 0.4.5
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Mar 16 12:19:38 2017 (1489666778)
Build timestamp: 1489666778
Build timestamp as int: 1489666778

CUDA version: 8.0.44
cuDNN version:5.1.5
GPU: 3GPUs. All of them are GeForce GTX 1080Ti 11GB
Memory: 32GB

I didn't change source code.
with 1 GPU:

TRAIN_DIR=/tmp/train_logs
DATASET_DIR=/home/l/data/flowers
python train_image_classifier.py     --train_dir=${TRAIN_DIR}     --dataset_name=flowers     --dataset_split_name=train     --dataset_dir=${DATASET_DIR}     --model_name=inception_resnet_v2
…
(log here is same as running with 3 gpus)
…
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 10: loss = 3.2313 (0.96 sec/step)
INFO:tensorflow:global step 20: loss = 3.7792 (0.97 sec/step)
INFO:tensorflow:global step 30: loss = 2.9681 (0.96 sec/step)
INFO:tensorflow:global step 40: loss = 3.8321 (0.97 sec/step)
INFO:tensorflow:global step 50: loss = 3.2210 (0.96 sec/step)
...

when I use 3 gpus:
python train_image_classifier.py     --train_dir=${TRAIN_DIR}     --dataset_name=flowers     --dataset_split_name=train     --dataset_dir=${DATASET_DIR}     --model_name=inception_resnet_v2 --num_clones=3
2017-04-24 14:26:11.885411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Graphics Device
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:05:00.0
Total memory: 10.91GiB
Free memory: 10.53GiB
2017-04-24 14:26:11.885472: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x5b62c2c0
2017-04-24 14:26:12.131777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 1 with properties: 
name: Graphics Device
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:06:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
2017-04-24 14:26:12.131848: W tensorflow/stream_executor/cuda/cuda_driver.cc:485] creating context when one is currently active; existing: 0x5945f2d0
2017-04-24 14:26:12.369331: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 2 with properties: 
name: Graphics Device
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:09:00.0
Total memory: 10.91GiB
Free memory: 10.75GiB
2017-04-24 14:26:12.371583: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 1 2 
2017-04-24 14:26:12.371596: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y Y Y 
2017-04-24 14:26:12.371601: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 1:   Y Y Y 
2017-04-24 14:26:12.371606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 2:   Y Y Y 
2017-04-24 14:26:12.371615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:05:00.0)
2017-04-24 14:26:12.371622: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Graphics Device, pci bus id: 0000:06:00.0)
2017-04-24 14:26:12.371625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Graphics Device, pci bus id: 0000:09:00.0)
INFO:tensorflow:Restoring parameters from /tmp/train_logs/model.ckpt-0
**2017-04-24 14:26:17.426353: I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /device:GPU:2 for node 'clone_2/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /device:CPU:0
2017-04-24 14:26:17.427748: I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /device:GPU:1 for node 'clone_1/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /device:CPU:0
2017-04-24 14:26:17.429099: I tensorflow/core/common_runtime/simple_placer.cc:669] Ignoring device specification /device:GPU:0 for node 'clone_0/fifo_queue_Dequeue' because the input edge from 'prefetch_queue/fifo_queue' is a reference connection and already has a device field set to /device:CPU:0**
INFO:tensorflow:Starting Session.
INFO:tensorflow:Saving checkpoint to path /tmp/train_logs/model.ckpt
INFO:tensorflow:Starting Queues.
INFO:tensorflow:global_step/sec: 0
INFO:tensorflow:Recording summary at step 0.
INFO:tensorflow:global step 10: loss = 2.9670 (0.98 sec/step)
INFO:tensorflow:global step 20: loss = 2.9945 (0.99 sec/step)
INFO:tensorflow:global step 30: loss = 3.0432 (0.99 sec/step)
INFO:tensorflow:global step 40: loss = 3.0007 (1.04 sec/step)
INFO:tensorflow:global step 50: loss = 2.8072 (1.03 sec/step)
...

I saw "Ignoring device specification" and the training speed didn't change.
This is nvidia-smi output with 3 gpus.

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.13                 Driver Version: 378.13                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Graphics Device     Off  | 0000:05:00.0      On |                  N/A |
| 49%   83C    P2   140W / 250W |  10754MiB / 11171MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Graphics Device     Off  | 0000:06:00.0     Off |                  N/A |
| 47%   81C    P2   137W / 250W |  10744MiB / 11172MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Graphics Device     Off  | 0000:09:00.0     Off |                  N/A |
| 43%   74C    P2   130W / 250W |  10744MiB / 11172MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1065    G   /usr/lib/xorg/Xorg                             160MiB |
|    0      1757    G   compiz                                          81MiB |
|    0     14407    C   python                                       10497MiB |
|    1     14407    C   python                                       10729MiB |
|    2     14407    C   python                                       10729MiB |
+-----------------------------------------------------------------------------+

Something Else
  I tried inception model with 3 gpus and it worked well with speed boost. There was no "Ignoring device specification" in inception model logs. I'm not sure whether it is the problem.

  similar problem: 
  #1338 
  https://github.com/tensorflow/tensorflow/issues/8061 (I tried the script in TF1.1.0 and "Ignoring device specification" appeared too. If someone needs details,I will post logs.)
 

 I changed model to inception_v3. It seems nothing changed.
 I'm also considering if I can output batch content that may be helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slim multi-gpu performance problems #1390

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slim multi-gpu performance problems #1390

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions