Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Performance regression in augmentation #11469

@rahul003

Description

@rahul003

Description

There'a big performance regression in the Augmentation for RecordIO pipeline (slowing down from ~5000 samples/sec to ~3000 samples/sec for Resnet50 on Imagenet). This is linked to this PR #11027

What the PR tries to do itself is not problematic, I can get 5k samples/sec with an older commit d37f3a3 on that PR from May24. But in the form it got merged in there's a big slowdown.

Environment info

Package used (Python/R/Scala/Julia): Python 3

Build info

pip nightly (mxnet-cu90-1.3.0b20180627) , as well as built from source from master any commit after the above PR got merged

MXNet commit hash: N/A

Build config: Tried with and without USE_LIBJPEG_TURBO, using that increases the speed a bit (~3500), but still much slower than before. Also enabled USE_CUDA, USE_CUDNN

Steps to reproduce

python example/image-classification/train_imagenet.py --gpus 0,1,2,3,4,5,6,7 --batch-size 2048 --dtype float16 --network resnet-v1b --data-nthreads 40 --optimizer sgd --data-train /media/ramdisk/pass-through/train-passthrough.rec --data-train-idx /media/ramdisk/pass-through/train-passthrough.idx --data-val /media/ramdisk/pass-through/val-passthrough.rec --data-val-idx /media/ramdisk/pass-through/val-passthrough.idx 

What have you tried to solve it?

I've tried to profile it and see what might be wrong with the tool perf. It looks like opencv is causing a wait for some reason. Please see figure 3

  1. Here's a perf summary now
    image

  2. Perf summary from the May 24 commit
    image

  3. Call graph using perf
    image

@hetong007 @piiswrong Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions