-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
TL;DR
The definition of glm::isnan
is infinitely recursive when compiling CUDA code, and returns incorrect results inside CUDA kernels.
Problem
When calling the scalar function glm::isnan
inside CUDA kernels, it returns false for values that are NaN. Here's a minimal example that demonstrates:
#define GLM_FORCE_CUDA
#include <cuda.h>
#include <math_constants.h>
#include <cstdio>
#include <glm/glm.hpp>
__global__ void isnan_kernel() {
const bool test_glm = glm::isnan(CUDART_NAN_F);
const bool test_cuda = ::isnan(CUDART_NAN_F);
printf("glm::isnan(CUDART_NAN_F) = %s\n", test_glm ? "true" : "false");
printf("::isnan(CUDART_NAN_F) = %s\n", test_cuda ? "true" : "false");
}
int main(int argc, char* argv[]) {
isnan_kernel<<<1, 1>>>();
cudaDeviceSynchronize();
return 0;
}
This code can be compiled with the command nvcc --gpu-architecture=compute_61 --gpu-code=sm_61 -g -O3 main.cu -o isnan_test
. When I run it on my machine (workstation with GeForce GTX 1080, running Ubuntu 16.04 LTS, CUDA 8.0.61, NVIDIA driver version 384), I get the following output:
./isnan_test
glm::isnan(CUDART_NAN_F) = false
::isnan(CUDART_NAN_F) = true
If I modify the last line of the program above, and replace return 0;
with return glm::isnan(0.0f);
, the executable never terminates (due to the infinite recursion).
Diagnosis
Here's the implementation of glm::isnan
from glm/detail/func_common.inl:
# if GLM_HAS_CXX11_STL
using std::isnan;
# else
template <typename genType>
GLM_FUNC_QUALIFIER bool isnan(genType x)
{
GLM_STATIC_ASSERT(std::numeric_limits<genType>::is_iec559, "'isnan' only accept floating-point inputs");
# if GLM_HAS_CXX11_STL
return std::isnan(x);
# elif GLM_COMPILER & GLM_COMPILER_VC
return _isnan(x) != 0;
# elif GLM_COMPILER & GLM_COMPILER_INTEL
# if GLM_PLATFORM & GLM_PLATFORM_WINDOWS
return _isnan(x) != 0;
# else
return ::isnan(x) != 0;
# endif
# elif (GLM_COMPILER & (GLM_COMPILER_GCC | (GLM_COMPILER_APPLE_CLANG | GLM_COMPILER_LLVM))) && (GLM_PLATFORM & GLM_PLATFORM_ANDROID) && __cplusplus < 201103L
return _isnan(x) != 0;
# elif GLM_COMPILER & GLM_COMPILER_CUDA
return isnan(x) != 0;
# else
return std::isnan(x);
# endif
}
# endif
When GLM_COMPILER & GLM_COMPILER_CUDA
is true, glm::isnan
calls the unqualified isnan
. Namespace lookup rules dictate that the enclosing namespace should be searched for the definition of isnan
before searching the global scope. Thus, isnan
is resolved to be glm::isnan
, and the function definition becomes infinitely recursive.
Proposed solution
Instead of calling the unqualified isnan
, the implementation should probably explicitly call the global scope ::isnan
.
Other notes
- I didn't investigate, but I imagine that
glm::isfinite
might suffer from the same problem - It's unclear how, or why, the call to
glm::isnan
terminates inside the CUDA kernel. When you callglm::isnan
in host-side code compiled by NVCC, it does not terminate.