-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Support channel groups in convolutional layers #10482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can one of the admins verify this patch? |
Francois, mind commenting whether this would be useful to add? |
I don't think it's currently widely used enough to justify becoming part of the core layers. The proposed implementation would also not be useful in any practical setting, because it relies on manually splitting tensors, running independent convolutions ops, and concatenating the outputs. This will be very slow and inefficient. When we want to add support for convolution groups, it should happen at the op level, not be manually implemented as a graph of ops on the Python side. In theory, a convolution with groups (e.g. 8 groups in a 128-big channel space) should be significantly faster than a regular convolution, but with this setup it would be dramatically slower. Since the added speed / efficiency is the core reason for using them, that is clearly a big issue. |
Thanks for the pull request! If you'd like to add support for this feature in the meanwhile, consider adding it to contrib, or reopening the feature request and discussing it there. |
The lack of group convolution prevents migrating many pre-trained models from Caffe to TensorFlow. For example, 3 out of 5 example Caffe models use this feature. The users have to manually split the Caffe kernel to multiple TensorFlow conv layers, which is especially laborious and error-prone for large networks. This manual approach is also done in Python that is no faster, if not slower, than in conv layer. In terms of performance, if the group number is set to 1 (as default), there is no performance difference from the current version. It would be ideal to implement this feature in the op. Supporting |
@bowang I find there may be something to do with the device. GPU is slower somehow when loop-intensive ? Is there any way for optimization? |
@sleepfin do you mean a 32-group convolution is 2x slower than 32 normal convolutions? It is understandable that group convolution is slower than normal convolution since it decomposes a big convolution into multiple smaller convolutions. Batching usually helps performance. A fair comparison should be one 32-group 4-channel convolution vs 32 single-group 4-channel convolutions, rather than one single-group 128-channel convolution. |
Again, if we add it, it should be done at the op level and should be fast. If we add a feature as part of the core API then users should have a reasonable expectation that the feature does what it claims to do. People's motivation for using convolution groups is that they should result in cheaper, faster convolutions. If we provide this option but the result is actually slower convolutions, we are not answering user expectations. |
@bowang |
@bowang |
@sleepfin Thanks for the performance measurement! My hypothesis on the GPU/CPU performance difference is caused by the change of per-op and total computation workload in group convolution. As in the current implementation that a single large convolution is decomposed into a number of small convolutions, the workload for a single small convolution may not be able to fully utilize a GPU to its limit. Thus, the GPU utilization decrease may lead to the slowdown. Meanwhile, when comparing the total amount of workload, that of the group convolution is smaller than that of a single convolution. Since CPU has much lower computation power, it is always saturated. Thus, the speedup on CPU was due to the reduction of total workload. This is just my hypothesis. Hope it may generate some ideas for further performance debugging. |
@sleepfin Bias and activation are applied on the concatenated results of conv groups as you may find on line 194 (concat), line 218 (bias add), and line 221 (activation). BN is implemented as a separate layer which I think is out of the scope of conv layers. |
A recent paper highlights the possible usefulness of group convolutions for mobile uses: https://arxiv.org/abs/1707.01083 Looking forward to see a possibility of group convolution implemented in TF. |
@fchollet Do you have a typical pathway to recommend for people interested in efficiently implementing a layer in TF (i.e. where in specific in the op level should one implement)? |
This PR implements the channel groups in convolutional layers (Conv1D, Conv2D, Conv3D, Conv2DTransposed).
The grouped convolution was firstly implemented in AlexNet as a way to share filter parameters across feature maps. A detailed discussion on this feature can be found on here. This feature is supported by Caffe (doc) and used in its reference caffenet. Adding this feature to TensorFlow makes it easier to compare models on two different frameworks and migrate from Caffe to TensorFlow.