Skip to content

Is the Focus layer equivalent to a simple Conv layer? #4825

@thomasbi1

Description

@thomasbi1

Hi

I had a look at the Focus layer and it seems to me like it is equivalent to a simple 2d-convolutional layer without the need for the space-to-depth operation. For example, a Focus layer with kernel size 3 can be expressed as a Conv layer with kernel size 6 and stride 2 . I wrote some code to verify this:

import torch
from models.common import Focus, Conv
from utils.torch_utils import profile


focus = Focus(3, 64, k=3).eval()
conv = Conv(3, 64, k=6, s=2, p=2).eval()

# Express focus layer as conv layer
conv.bn = focus.conv.bn
conv.conv.weight.data[:, :, ::2, ::2] = focus.conv.conv.weight.data[:, :3]
conv.conv.weight.data[:, :, 1::2, ::2] = focus.conv.conv.weight.data[:, 3:6]
conv.conv.weight.data[:, :, ::2, 1::2] = focus.conv.conv.weight.data[:, 6:9]
conv.conv.weight.data[:, :, 1::2, 1::2] = focus.conv.conv.weight.data[:, 9:12]

# Compare
x = torch.randn(16, 3, 640, 640)
with torch.no_grad():
    # Results are not perfectly identical, errors up to about 1e-7 occur (probably numerical)
    assert torch.allclose(focus(x), conv(x), atol=1e-6)

# Profile
results = profile(input=torch.randn(16, 3, 640, 640), ops=[focus, conv, focus, conv], n=10, device=0)

And the output as follows:

YOLOv5 🚀 v5.0-434-g0dc725e torch 1.9.0+cu111 CUDA:0 (A100-SXM4-40GB, 40536.1875MB)
      Params      GFLOPs  GPU_mem (GB)  forward (ms) backward (ms)                   input                  output
        7040       23.07         2.682         4.055         13.78       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.474         9.989       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.343         3.556         11.57       (16, 3, 640, 640)      (16, 64, 320, 320)
        7040       23.07         2.368         3.456         9.961       (16, 3, 640, 640)      (16, 64, 320, 320)

I did have to slightly tweak the tolerance in torch.allcose for the assertion to succeed, but looking at the errors they seem to be purely numerical.

So am I missing something or could the Focus layer simply be replaced by a Conv layer which would lead to a slight increase in speed?

Metadata

Metadata

Assignees

Labels

StaleStale and schedule for closing soonquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions