Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

Spatial transformer test case fails often #7645

@rahul003

Description

@rahul003

Operating System: Linux
Compiler: gcc4.8
Package used: Python
MXNet commit hash (git rev-parse HEAD): 860dda2

Python version and distribution: 2.7 on gpu

Error Message:

======================================================================
FAIL: test_operator_gpu.test_spatial_transformer_with_type
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/workspace/tests/python/gpu/test_operator_gpu.py", line 654, in test_spatial_transformer_with_type
    check_consistency(sym, ctx_list)
  File "/workspace/python/mxnet/test_utils.py", line 1120, in check_consistency
    raise e
  File "/workspace/python/mxnet/test_utils.py", line 1115, in check_consistency
    assert_almost_equal(arr, gtarr, rtol=tol[dtypes[i]], atol=tol[dtypes[i]])
  File "/workspace/python/mxnet/test_utils.py", line 351, in assert_almost_equal
    raise AssertionError(msg)
nose.proxy.AssertionError: 
Items are not equal:
Error 3.625779 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(0, 3, 6, 3), a=0.410997, b=0.416132
 a: array([[[[-31.50229263, -21.41567993,   6.15771866, ...,   2.8015573 ,
          -13.36835289, -23.15978813],
         [ -3.62330437, -38.62851715, -32.01438141, ...,  12.89842606,...
 b: array([[[[-31.51775551, -21.42281914,   6.15355921, ...,   2.80197096,
          -13.37347221, -23.16914177],
         [ -3.62049437, -38.64873505, -32.03084564, ...,  12.90172482,...
-------------------- >> begin captured stdout << ---------------------
Train Err: ctx 1 vs ctx 0 at data

--------------------- >> end captured stdout << ----------------------

Minimum reproducible example

This test fails intermittently on gpu with mklml or cudnn. In the last 30 builds on master it has failed 10+ times. Sometimes it fails with mklml and sometimes with cudnn.
Example to the test failure on build
https://builds.apache.org/blue/organizations/jenkins/incubator-mxnet/detail/master/258/pipeline

Steps to reproduce

  1. make DEV=1 USE_PROFILER=1 USE_BLAS=openblas USE_CUDA=1 USE_CUDA_PATH=/usr/local/cuda USE_CUDNN=1 USE_CPP_PACKAGE=1 -j8

  2. PYTHONPATH=./python/ nosetests --verbose tests/python/gpu/test_operator_gpu.py:test_spatial_transformer_with_type

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions