Skip to content

Advices on Multi-GPU support? #121

@RichardKov

Description

@RichardKov

Hi Ender, thanks for your work!

There have been some requests for multi-gpu support(e.g. #51). I am now trying to write a multi-gpu version based on your code.

However, after looking into the code, it seems that the current structure does not support multi-gpu well. For example. if I modify train_val.py in this way:

      with tf.variable_scope(tf.get_variable_scope()):
        for i in range(2):
            with tf.device("/gpu:" + str(i)):
                with tf.name_scope("tower_" + str(i)) as scope:
                    # Build the main computation graph
                    layers = self.net.create_architecture(sess,'TRAIN', self.num_classes, tag='default',
                                                          anchor_scales=cfg.ANCHOR_SCALES,
                                                          anchor_ratios=cfg.ANCHOR_RATIOS)
                    # Define the loss
                    loss = layers['total_loss']
                    losses.append(loss)
                    
                    tf.get_variable_scope().reuse_variables()
                    
                    grads = self.optimizer.compute_gradients(loss)
                    
                    tower_grads.append(grads)
                    scopes.append(scope)
      # Compute the gradients wrt the loss                  
      gvs = self.average_gradients(tower_grads)

It can not work because the network class has only one "self.image" so an error of

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'tower_0/Placeholder' with dtype float

will be throwed.

Can you give any advises of how to implement a multi-gpu version of this code?

many thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions