Skip to content

Flag for deterministic GPU operations. #3029

@lamblin

Description

@lamblin

Some GPU operation use AtomicAdd for efficiency when accumulating numbers in a buffer, for summation operations for instance. The problem is that limited-precision floating-point arithmetic is not exactly associative, and AtomicAdd does not guarantee the order in which the operations will be done.
This means that the results are not exactly reproducible numerically.
Ops concerned by that would be:

  • AdvancedIncSubtensor1_dev20: uses AtomicAdd, no way to disable it, but AdvancedIncSubtensor1 (slow) is deterministic
  • CuDNN convolution gradients: some algorithms (the only ones present in old versions) use AtomicAdd for the gradients. deterministic, fft, and fft_tiling (if available), at least, should be deterministic.

We should have a global theano flag to select between the two behaviours, and maybe a finer-grained control at the Op level itself if feasible.

Allow faster, but non-deterministic version:

  • Gemm, Gemv, maybe other BLAS Ops: they do not use AtomicAdd for the moment, but could be more efficient if they did

TEMP WORK AROUND:

  • To have deterministic cudnn convolution, use those Theano flag: dnn.conv.algo_bwd_filter="deterministic" or "fft" and dnn.algo_bwd_data='deterministic' or 'fft or 'fft_tiling'.
  • To our information there is no deterministic 3d conv grad vs inputs and 3d conv grad vs filter.
  • To have deterministic AdvancedIncSubtensor1 on the GPU, apply those 2 diff:
diff --git a/theano/sandbox/cuda/opt.py b/theano/sandbox/cuda/opt.py
index 7a9b953..f9d60de 100644
--- a/theano/sandbox/cuda/opt.py
+++ b/theano/sandbox/cuda/opt.py
@@ -1121,8 +1121,8 @@ def local_gpu_advanced_incsubtensor1(node):
             compute_capability = device_properties(active_device_no)['major']
             if (compute_capability < 2 or
                 x.ndim != 2 or
-                y.ndim != 2):
-
+                y.ndim != 2 or
+                config.deterministic):
                 gpu_op = GpuAdvancedIncSubtensor1(
                     set_instead_of_inc=set_instead_of_inc)
             else:
@@ -1164,7 +1164,8 @@ def local_gpu_advanced_incsubtensor1(node):
             compute_capability = device_properties(active_device_no)['major']
             if (compute_capability < 2 or
                 x.ndim != 2 or
-                y.ndim != 2):
+                y.ndim != 2 or
+                config.deterministic):
                 gpu_op = GpuAdvancedIncSubtensor1(
                     set_instead_of_inc=set_instead_of_inc)
             else:

diff --git a/theano/sandbox/gpuarray/opt.py b/theano/sandbox/gpuarray/opt.py
index 92a115c..7332607 100644
--- a/theano/sandbox/gpuarray/opt.py
+++ b/theano/sandbox/gpuarray/opt.py
@@ -639,7 +639,8 @@ def local_gpua_advanced_incsubtensor(node, context_name):

     compute_capability = device_properties(active_device_no)['major']

-    if (compute_capability < 2 or x.ndim != 2 or y.ndim != 2):
+    if (compute_capability < 2 or x.ndim != 2 or y.ndim != 2 or
+            theano.config.deterministic):
         return GpuAdvancedIncSubtensor1(
             set_instead_of_inc=set_instead_of_inc)
     else:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions