-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Make context handling in GPU runtimes more consistent and robust. #5474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(shader/kernel/etc.) compilations.
the intial support and adds tests. Currently the gpu_multi test only has context creation code fo CUDA and OpenCL. This shoulkd be added for other GPU runteims, but some coverage is provided via using the default context for these APIs. Fixes a bug in CUDA runtime where some error message text in cuda_do_multidimensional_copy was not initialized. Fixes a bug in CUDA runtime where device release code did not run if CUDA libraries are directly linked into the executable. (This would have caused crashes due to the device allocation caching among other issues.)
Add initial commits explaining what tests do.
it to stick closer to naming pattern and work with CMake rules code.
See #5475 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from a quick skim -- gonna wait for buildbots to look clean(er) before reviewing more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with nits
Possibly will be used to control conditional compilation at some point. Really it should probably nerf the test entirely outside of GPU APIs it can make contexts for. But it does get a little coverage on e.g. Metal and Direct3d so... |
Move to using common code for kernel compilation caching for CUDA, OpenCL, Metal, and D3D12 GPU runtimes. New caching endeavors to be robust in not using a kernel compiled for one context on another and uses a hash table to avoid small allocations across multiple pages of VM. OpenCL was particularly broken in that code using two contexts was almost guaranteed to fail. This PR also opens the door to allowing better client control of caching, such as setting a size limit or allowing eviction of specific kernels, and is pretty close to allowing runtime overloads of the kernel compilation itself to allow persistent caching across process invocations for GPU APIs that allow this. (The
compile_kernel
function in multiple files needs to be promoted to a client visible runtime overload for each GPU API.)Tests are added to cover many kernels and more than one context. A test using multiple contexts across multiple threads both tests things that didn't necessarily work before and provides an example for a common use case.
Two small fixes to CUDA prevent a crash in a very rare error case and make device release work if the CUDA library is linked directly into the app. (The latter would have shown up as a crash due to allocation caching for static linking as the code to release allocations when freeing a context did not run.)
OpenGL and OpenGLCompute were not addressed in this PR due to both time limitations and because there are more significant issues in these runtimes around this area. OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving, though similar work is needed re: how kernels are communicated to the runtime and compiled.