Support channel groups in convolutional layers #10482

bowang · 2017-06-07T05:56:48Z

This PR implements the channel groups in convolutional layers (Conv1D, Conv2D, Conv3D, Conv2DTransposed).

The grouped convolution was firstly implemented in AlexNet as a way to share filter parameters across feature maps. A detailed discussion on this feature can be found on here. This feature is supported by Caffe (doc) and used in its reference caffenet. Adding this feature to TensorFlow makes it easier to compare models on two different frameworks and migrate from Caffe to TensorFlow.

tensorflow-jenkins · 2017-06-07T05:56:49Z

Can one of the admins verify this patch?

jhseu · 2017-06-07T18:06:02Z

Francois, mind commenting whether this would be useful to add?

fchollet · 2017-06-07T18:14:47Z

I don't think it's currently widely used enough to justify becoming part of the core layers. The proposed implementation would also not be useful in any practical setting, because it relies on manually splitting tensors, running independent convolutions ops, and concatenating the outputs. This will be very slow and inefficient.

When we want to add support for convolution groups, it should happen at the op level, not be manually implemented as a graph of ops on the Python side. In theory, a convolution with groups (e.g. 8 groups in a 128-big channel space) should be significantly faster than a regular convolution, but with this setup it would be dramatically slower. Since the added speed / efficiency is the core reason for using them, that is clearly a big issue.

jhseu · 2017-06-07T18:32:02Z

Thanks for the pull request! If you'd like to add support for this feature in the meanwhile, consider adding it to contrib, or reopening the feature request and discussing it there.

bowang · 2017-06-07T18:44:56Z

The lack of group convolution prevents migrating many pre-trained models from Caffe to TensorFlow. For example, 3 out of 5 example Caffe models use this feature. The users have to manually split the Caffe kernel to multiple TensorFlow conv layers, which is especially laborious and error-prone for large networks. This manual approach is also done in Python that is no faster, if not slower, than in conv layer.

In terms of performance, if the group number is set to 1 (as default), there is no performance difference from the current version. It would be ideal to implement this feature in the op. Supporting
this feature on the API can be a first step. It allows users to migrate to TensorFlow. Then we can profile and optimize.

sleepfin · 2017-06-09T08:59:06Z

@bowang
I think its much slower when group is large when using your codes.
For example, group=32 is used in resnext
group convolution is more than 2 times slower than normal convolution(group=1)
(total_ops is 128% and total_params is 34% for group=32 & group=1 in my case)

I find there may be something to do with the device.
The results above are based on GPU running. When I run my network on CPU, it seems that group convolution is more than 3 times faster than normal convolution

GPU is slower somehow when loop-intensive ?

Is there any way for optimization?

bowang · 2017-06-09T18:59:20Z

@sleepfin do you mean a 32-group convolution is 2x slower than 32 normal convolutions?

It is understandable that group convolution is slower than normal convolution since it decomposes a big convolution into multiple smaller convolutions. Batching usually helps performance.

A fair comparison should be one 32-group 4-channel convolution vs 32 single-group 4-channel convolutions, rather than one single-group 128-channel convolution.

fchollet · 2017-06-09T19:50:39Z

Again, if we add it, it should be done at the op level and should be fast.

If we add a feature as part of the core API then users should have a reasonable expectation that the feature does what it claims to do. People's motivation for using convolution groups is that they should result in cheaper, faster convolutions. If we provide this option but the result is actually slower convolutions, we are not answering user expectations.

sleepfin · 2017-06-10T04:56:20Z

@bowang
But in a real case, we usually replace "one single-group 128-channel convolution" with "32-group 4-channel convolution" like ResNeXt, which means the param:group =1 or 32.
And notice that on CPU divece, "32-group 4-channel convolution" is faster than "one single-group 128-channel convolution" but the opposite on GPU device
My point is it will be great to have a high performance group convolution on both CPU and GPU.

sleepfin · 2017-06-12T01:23:55Z

@bowang
Another question:
BN & Activation & Bias_add layer are applied to:
1 -> The output of a single path of conv groups ?
2 -> The output of final concat layer ?
In your codes, I think it will be the first case if I use with arg_scope
I'm not sure if its correct and more efficent to apply those layers to the final concat output.

bowang · 2017-06-13T00:13:30Z

@sleepfin Thanks for the performance measurement! My hypothesis on the GPU/CPU performance difference is caused by the change of per-op and total computation workload in group convolution.

As in the current implementation that a single large convolution is decomposed into a number of small convolutions, the workload for a single small convolution may not be able to fully utilize a GPU to its limit. Thus, the GPU utilization decrease may lead to the slowdown.

Meanwhile, when comparing the total amount of workload, that of the group convolution is smaller than that of a single convolution. Since CPU has much lower computation power, it is always saturated. Thus, the speedup on CPU was due to the reduction of total workload.

This is just my hypothesis. Hope it may generate some ideas for further performance debugging.

bowang · 2017-06-13T00:19:34Z

@sleepfin Bias and activation are applied on the concatenated results of conv groups as you may find on line 194 (concat), line 218 (bias add), and line 221 (activation). BN is implemented as a separate layer which I think is out of the scope of conv layers.

kwotsin · 2017-07-11T11:43:19Z

A recent paper highlights the possible usefulness of group convolutions for mobile uses: https://arxiv.org/abs/1707.01083

Looking forward to see a possibility of group convolution implemented in TF.

kwotsin · 2017-07-13T11:52:43Z

Again, if we add it, it should be done at the op level and should be fast.

@fchollet Do you have a typical pathway to recommend for people interested in efficiently implementing a layer in TF (i.e. where in specific in the op level should one implement)?

lgeiger · 2019-12-19T01:42:44Z

@fchollet #25818 Added support for Conv2D Group Convolutions using CUDNN on GPUs. Would you be open to revisit this addition for Keras? I'd be happy to send a PR.

danijar · 2020-01-07T05:05:20Z

@lgeiger @fchollet It would be great to have group convolutions and their transposed version in Keras. There are at least depth-wise convolutions, but no DepthwiseConv2DTranspose to build image decoders.

Support channel groups in convolution layers

5ebe137

googlebot added the cla: yes label Jun 7, 2017

bowang mentioned this pull request Jun 7, 2017

Feature Request: Support for depthwise convolution by groups #3332

Closed

jhseu added the API review API Review label Jun 7, 2017

jhseu assigned fchollet Jun 7, 2017

jhseu closed this Jun 7, 2017

shreyneil mentioned this pull request Jul 21, 2017

Group Convolutions Support Request #11662

Closed

EchoWho mentioned this pull request Mar 30, 2018

Grouped conv data_format bug and efficiency issue tensorpack/tensorpack#720

Closed

ppwwyyxx mentioned this pull request Feb 17, 2019

Add support for cudnn's group convolution. #25818

Merged

lgeiger mentioned this pull request Jan 3, 2020

RFC: Keras Grouped Convolution keras-team/governance#16

Closed

Support channel groups in convolutional layers #10482

Support channel groups in convolutional layers #10482

Uh oh!

Conversation

bowang commented Jun 7, 2017

Uh oh!

tensorflow-jenkins commented Jun 7, 2017

Uh oh!

jhseu commented Jun 7, 2017

Uh oh!

fchollet commented Jun 7, 2017

Uh oh!

jhseu commented Jun 7, 2017

Uh oh!

bowang commented Jun 7, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleepfin commented Jun 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bowang commented Jun 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fchollet commented Jun 9, 2017

Uh oh!

sleepfin commented Jun 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sleepfin commented Jun 12, 2017

Uh oh!

bowang commented Jun 13, 2017

Uh oh!

bowang commented Jun 13, 2017

Uh oh!

kwotsin commented Jul 11, 2017

Uh oh!

kwotsin commented Jul 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lgeiger commented Dec 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danijar commented Jan 7, 2020

Uh oh!

Uh oh!

bowang commented Jun 7, 2017 •

edited

Loading

sleepfin commented Jun 9, 2017 •

edited

Loading

bowang commented Jun 9, 2017 •

edited

Loading

sleepfin commented Jun 10, 2017 •

edited

Loading

kwotsin commented Jul 13, 2017 •

edited

Loading

lgeiger commented Dec 19, 2019 •

edited

Loading