Is the Focus layer equivalent to a simple Conv layer?

Hi

I had a look at the Focus layer and it seems to me like it is equivalent to a simple 2d-convolutional layer without the need for the space-to-depth operation. For example, a `Focus` layer with kernel size 3 can be expressed as a `Conv` layer with kernel size 6 and stride 2 . I wrote some code to verify this:

```python
import torch
from models.common import Focus, Conv
from utils.torch_utils import profile


focus = Focus(3, 64, k=3).eval()
conv = Conv(3, 64, k=6, s=2, p=2).eval()

# Express focus layer as conv layer
conv.bn = focus.conv.bn
conv.conv.weight.data[:, :, ::2, ::2] = focus.conv.conv.weight.data[:, :3]
conv.conv.weight.data[:, :, 1::2, ::2] = focus.conv.conv.weight.data[:, 3:6]
conv.conv.weight.data[:, :, ::2, 1::2] = focus.conv.conv.weight.data[:, 6:9]
conv.conv.weight.data[:, :, 1::2, 1::2] = focus.conv.conv.weight.data[:, 9:12]

# Compare
x = torch.randn(16, 3, 640, 640)
with torch.no_grad():
    # Results are not perfectly identical, errors up to about 1e-7 occur (probably numerical)
    assert torch.allclose(focus(x), conv(x), atol=1e-6)

# Profile
results = profile(input=torch.randn(16, 3, 640, 640), ops=[focus, conv, focus, conv], n=10, device=0)
```

And the output as follows:

```python
YOLOv5 🚀 v5.0-434-g0dc725e torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40536.1875MB)
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        7040       23.07         2.682         4.055         13.78       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.474         9.989       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.343         3.556         11.57       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.456         9.961       (16, 3, 640, 640)      (16, 64, 320, 320)
```

I did have to slightly tweak the tolerance in `torch.allcose` for the assertion to succeed, but looking at the errors they seem to be purely numerical.

So am I missing something or could the `Focus` layer simply be replaced by a `Conv` layer which would lead to a slight increase in speed?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is the Focus layer equivalent to a simple Conv layer? #4825

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Is the Focus layer equivalent to a simple Conv layer? #4825

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions