Flag for deterministic GPU operations.

Some GPU operation use AtomicAdd for efficiency when accumulating numbers in a buffer, for summation operations for instance. The problem is that limited-precision floating-point arithmetic is not exactly associative, and AtomicAdd does not guarantee the order in which the operations will be done.
This means that the results are not exactly reproducible numerically.
Ops concerned by that would be:
- [ ] AdvancedIncSubtensor1_dev20: uses AtomicAdd, no way to disable it, but AdvancedIncSubtensor1 (slow) is deterministic
- [ ] CuDNN convolution gradients: some algorithms (the only ones present in old versions) use AtomicAdd for the gradients. `deterministic`, `fft`, and `fft_tiling` (if available), at least, should be deterministic.

We should have a global theano flag to select between the two behaviours, and maybe a finer-grained control at the Op level itself if feasible.

Allow faster, but non-deterministic version:
- [ ] Gemm, Gemv, maybe other BLAS Ops: they do not use AtomicAdd for the moment, but could be more efficient if they did

TEMP WORK AROUND:
- To have deterministic cudnn convolution, use those Theano flag: dnn.conv.algo_bwd_filter="deterministic" or "fft" and dnn.algo_bwd_data='deterministic' or 'fft or 'fft_tiling'.
- To our information there is no deterministic 3d conv grad vs inputs and 3d conv grad vs filter.
- To have deterministic AdvancedIncSubtensor1 on the GPU, apply those 2 diff:

```
diff --git a/theano/sandbox/cuda/opt.py b/theano/sandbox/cuda/opt.py
index 7a9b953..f9d60de 100644
--- a/theano/sandbox/cuda/opt.py
+++ b/theano/sandbox/cuda/opt.py
@@ -1121,8 +1121,8 @@ def local_gpu_advanced_incsubtensor1(node):
             compute_capability = device_properties(active_device_no)['major']
             if (compute_capability < 2 or
                 x.ndim != 2 or
-                y.ndim != 2):
-
+                y.ndim != 2 or
+                config.deterministic):
                 gpu_op = GpuAdvancedIncSubtensor1(
                     set_instead_of_inc=set_instead_of_inc)
             else:
@@ -1164,7 +1164,8 @@ def local_gpu_advanced_incsubtensor1(node):
             compute_capability = device_properties(active_device_no)['major']
             if (compute_capability < 2 or
                 x.ndim != 2 or
-                y.ndim != 2):
+                y.ndim != 2 or
+                config.deterministic):
                 gpu_op = GpuAdvancedIncSubtensor1(
                     set_instead_of_inc=set_instead_of_inc)
             else:

diff --git a/theano/sandbox/gpuarray/opt.py b/theano/sandbox/gpuarray/opt.py
index 92a115c..7332607 100644
--- a/theano/sandbox/gpuarray/opt.py
+++ b/theano/sandbox/gpuarray/opt.py
@@ -639,7 +639,8 @@ def local_gpua_advanced_incsubtensor(node, context_name):

     compute_capability = device_properties(active_device_no)['major']

-    if (compute_capability < 2 or x.ndim != 2 or y.ndim != 2):
+    if (compute_capability < 2 or x.ndim != 2 or y.ndim != 2 or
+            theano.config.deterministic):
         return GpuAdvancedIncSubtensor1(
             set_instead_of_inc=set_instead_of_inc)
     else:
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flag for deterministic GPU operations. #3029

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Flag for deterministic GPU operations. #3029

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions