Word2vec on GPU slower than CPU


### System information
- **OS Platform and Distribution**: Linux Ubuntu 16.04
- **TensorFlow installed from (source or binary)**: source
- **TensorFlow version (use command below)**: 1.3.0
- **Python version**: 2.7.12
- **Bazel version (if compiling from source)**: 0.5.0
- **CUDA/cuDNN version**: 8.0/6.0
- **GPU model and memory**: NVIDIA GTX 1060 / 3GB
- **Docker used:** yes
- I picked up the code from the [word2vec example](https://github.com/tensorflow/models/tree/master/tutorials/embedding) on your official repo and made a few changes.The core code to train word2vec remains the same.

### Describe the problem
I have been working on benchmarking commonly used frameworks/libraries for unsupervised learning of word embeddings(word2vec). I am currently comparing tensorflow(cpu/gpu), gensim, deeplearning4j and the original c code on standard metrics like training time, peak memory usage and quality of learned vectors. Link to my [github repo](https://github.com/manneshiva/benchmark-word2vec-frameworks) (still working on it). I ran the benchmark on text8 corpus(plan to run it on a much larger corpus later for the true picture) which gave me strange results. 
- Tensorflow on GPU is much slower than CPU
- Tensorflow is much slower than other frameworks

Is this behavior expected? Would appreciate any inputs.
### Source code / logs
Link to [tensorflow code](https://github.com/manneshiva/benchmark-word2vec-frameworks/blob/master/nn_frameworks/tensorflow/word2vec.py)
Link to [results](http://nbviewer.jupyter.org/github/manneshiva/benchmark-word2vec-frameworks/blob/8223b9ff4b37869e5aef36a909dec384e08f3a05/visualize_report.ipynb) of sample benchmark on text8 corpus


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Word2vec on GPU slower than CPU #13048

System information

Describe the problem

Source code / logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Word2vec on GPU slower than CPU #13048

Description

System information

Describe the problem

Source code / logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions