fix flip() shape bug in CPU #13344

weiyangfb · 2018-10-30T22:24:06Z

fix torch.flip incorrect behavior #13292, root cause please see [DONT MERGE] demonstrate cause of shape bug at flip() #13682
this PR brings in filp() CUDA implementation for CPU kernel
with this change:

>>> t = torch.randn(1, 3, 4, 5)
>> t.flip(1, 3).shape
torch.Size([1, 3, 4, 5])

performance:

====== with this PR ======
>>> a = torch.randn(1000, 1000)
>>> %timeit -r 100 a.flip(0, 1)
1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each)

====== Perf at previous PR #7873 ======   
100 loops, best of 3: 11 ms per loop

ssnl

So what is the original cause of the bug? Also, could you benchmark on larger tensors? The OMP doesn't kick in until 1000 numel.

aten/src/ATen/native/TensorTransformations.cpp


+  dim_list_to_bitset(dims, total_dims);


aten/src/ATen/native/TensorTransformations.cpp

+  const int64_t numel = in_tensor.numel();
+  auto strides = in_tensor.strides();
+  auto strides_v = strides.vec();
+  auto strides_t = at::CPU(kLong).tensorFromBlob(strides_v.data(), {static_cast<int64_t>(strides_v.size())});


weiyangfb · 2018-10-31T17:37:33Z

@ssnl Thanks! The case I showed has numel = 10^6:

====== with this PR ======
>>> a = torch.randn(1000, 1000)
>>> %timeit -r 100 a.flip(0, 1)
1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each)

weiyangfb · 2018-10-31T17:41:31Z

The previous bug was caused by misused of advance indexing that I still can't figure out. Since a customized kernel is faster, so I just use a kernel instead.

ssnl · 2018-10-31T18:19:09Z

@weiyangfb Ah you are right about the benchmark. Sorry about it!

ssnl · 2018-10-31T18:19:59Z

Maybe worth trying to look into the advanced indexing issue further in future. We may have a bug there.

aten/src/ATen/native/TensorTransformations.cpp


+  dim_list_to_bitset(dims, total_dims); // returned bitset is not used, here only check correctness of dims


aten/src/ATen/native/TensorTransformations.cpp

+  maybe_wrap_dims(flip_dims_v, total_dims);
+
+  auto sizes = in_tensor.sizes();
+  auto flip_dims_t = at::CPU(kLong).tensorFromBlob(flip_dims_v.data(), {static_cast<int64_t>(flip_dims_v.size())});


weiyangfb · 2018-10-31T19:58:50Z

Maybe worth trying to look into the advanced indexing issue further in future. We may have a bug there.

Yeah, definitely! can I land this PR to fix the bug on user side first? We can keep the issue open until I figure out the root cause of it.

ssnl

Yeah, that doesn't need to be done in this PR :)

aten/src/ATen/native/TensorTransformations.cpp

+  Tensor out_tensor = at::empty_like(in_tensor);
+
+  // create contiguous strides for input tensor
+  Tensor stride_contiguous = at::zeros({total_dims}, kLong);


aten/src/ATen/native/TensorTransformations.cpp

+void inline flip_cpu_kernel(
+  const int64_t total_dims,
+  const int64_t* stride_contiguous_d,
+  const std::bitset<dim_bitset_size>& flip_dims_b,


aten/src/ATen/native/TensorTransformations.cpp

+      int64_t temp = cur_indices;
+      cur_indices = cur_indices / stride_contiguous_d[d];
+      rem = temp - cur_indices * stride_contiguous_d[d];
+      if (flip_dims_b[d]) cur_indices = in_tensor.size(d) - 1 - cur_indices;


… at kernel

…set update logic

weiyangfb · 2018-11-07T22:06:14Z

@ssnl I guess I found the root cause of the bug: #13682. Still prefer this PR since it is faster

weiyangfb · 2018-11-07T22:37:04Z

can I get a stamp on this? cc @ssnl

ssnl · 2018-11-07T22:42:11Z

wait... so which one should I look at?

weiyangfb · 2018-11-07T22:45:18Z

@ssnl Sorry about the confusion. You should look at this one. I just change the title of the other one: #13682

facebook-github-bot

@weiyangfb has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ssnl

Sorry for late review

Summary: - a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing - this PR brings in `filp()` CUDA implementation for CPU kernel - with this change: ``` >>> t = torch.randn(1, 3, 4, 5) >> t.flip(1, 3).shape torch.Size([1, 3, 4, 5]) ``` - performance: ``` ====== with this PR ====== >>> a = torch.randn(1000, 1000) >>> %timeit -r 100 a.flip(0, 1) 1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each) ====== Perf at previous PR #7873 ====== 100 loops, best of 3: 11 ms per loop ``` Pull Request resolved: pytorch/pytorch#13344 Differential Revision: D12968003 Pulled By: weiyangfb fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d

weiyangfb force-pushed the flip_shape_cpu_bug branch from 16f0e6d to cff159c Compare October 31, 2018 03:24

weiyangfb changed the title ~~[wip] fix flip() shape bug in CPU~~ fix flip() shape bug in CPU Oct 31, 2018

ssnl reviewed Oct 31, 2018

View reviewed changes

weiyangfb force-pushed the flip_shape_cpu_bug branch from c9dbfe4 to 176ce23 Compare October 31, 2018 19:53

ssnl reviewed Oct 31, 2018

View reviewed changes

weiyangfb added 10 commits November 2, 2018 00:20

[wip] use implementation from CUDA kernel

5883df8

fix a bug, and remove a no need test

335dad3

revert thrid party files, add test for tensor shape

27729c0

fix test

3408454

fix test; parallelize cpu kernel

d8f1ca8

clean up

ff6534c

fix test

fc273b0

address comments, runs slower when passing in tensors to kernel

b340bbe

replace tensor flip_dims_t with bitset flip_dims_b; remove inner loop…

ae454de

… at kernel

replace tensor stride_contiguous_d with a vector and simplify dst_off…

5cfe431

…set update logic

weiyangfb force-pushed the flip_shape_cpu_bug branch from 176ce23 to 5cfe431 Compare November 2, 2018 07:48

facebook-github-bot reviewed Nov 8, 2018

View reviewed changes

ssnl approved these changes Nov 8, 2018

View reviewed changes

facebook-github-bot closed this in 6bfce16 Nov 8, 2018

weiyangfb mentioned this pull request Nov 8, 2018

torch.flip incorrect behavior #13292

Closed

ezyang added the merged label Jun 25, 2019


		dim_list_to_bitset(dims, total_dims); // returned bitset is not used, here only check correctness of dims

fix flip() shape bug in CPU #13344

fix flip() shape bug in CPU #13344

Uh oh!

Conversation

weiyangfb commented Oct 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

weiyangfb commented Oct 31, 2018

Uh oh!

weiyangfb commented Oct 31, 2018

Uh oh!

ssnl commented Oct 31, 2018

Uh oh!

ssnl commented Oct 31, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

weiyangfb commented Oct 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

weiyangfb commented Nov 7, 2018

Uh oh!

weiyangfb commented Nov 7, 2018

Uh oh!

ssnl commented Nov 7, 2018

Uh oh!

weiyangfb commented Nov 7, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ssnl left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

weiyangfb commented Oct 30, 2018 •

edited

Loading

weiyangfb commented Oct 31, 2018 •

edited

Loading