Skip to content

Conversation

zvookin
Copy link
Member

@zvookin zvookin commented Nov 24, 2020

Move to using common code for kernel compilation caching for CUDA, OpenCL, Metal, and D3D12 GPU runtimes. New caching endeavors to be robust in not using a kernel compiled for one context on another and uses a hash table to avoid small allocations across multiple pages of VM. OpenCL was particularly broken in that code using two contexts was almost guaranteed to fail. This PR also opens the door to allowing better client control of caching, such as setting a size limit or allowing eviction of specific kernels, and is pretty close to allowing runtime overloads of the kernel compilation itself to allow persistent caching across process invocations for GPU APIs that allow this. (The compile_kernel function in multiple files needs to be promoted to a client visible runtime overload for each GPU API.)

Tests are added to cover many kernels and more than one context. A test using multiple contexts across multiple threads both tests things that didn't necessarily work before and provides an example for a common use case.

Two small fixes to CUDA prevent a crash in a very rare error case and make device release work if the CUDA library is linked directly into the app. (The latter would have shown up as a crash due to allocation caching for static linking as the code to release allocations when freeing a context did not run.)

OpenGL and OpenGLCompute were not addressed in this PR due to both time limitations and because there are more significant issues in these runtimes around this area. OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving, though similar work is needed re: how kernels are communicated to the runtime and compiled.

Z Stern added 8 commits November 18, 2020 16:56
(shader/kernel/etc.) compilations.
the intial support and adds tests. Currently the gpu_multi test only
has context creation code fo CUDA and OpenCL. This shoulkd be added
for other GPU runteims, but some coverage is provided via using the
default context for these APIs.

Fixes a bug in CUDA runtime where some error message text in
cuda_do_multidimensional_copy was not initialized.

Fixes a bug in CUDA runtime where device release code did not run if
CUDA libraries are directly linked into the executable. (This would
have caused crashes due to the device allocation caching among other
issues.)
Add initial commits explaining what tests do.
Z Stern added 3 commits November 24, 2020 09:31
@steven-johnson
Copy link
Contributor

OpenGL is basically a Superfund site at this point and should be deleted. OpenGLCompute may or may not be worth preserving

See #5475

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from a quick skim -- gonna wait for buildbots to look clean(er) before reviewing more.

Copy link
Contributor

@steven-johnson steven-johnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with nits

@zvookin zvookin merged commit f47c5c9 into master Dec 2, 2020
@zvookin zvookin deleted the gpu_context_consistency branch December 2, 2020 22:40
@zvookin
Copy link
Member Author

zvookin commented Dec 2, 2020

super-duper-style-nit: seems like constexpr bool HAS_MULTIPLE_CONTEXTS = true; (etc) would be preferable?

Possibly will be used to control conditional compilation at some point. Really it should probably nerf the test entirely outside of GPU APIs it can make contexts for. But it does get a little coverage on e.g. Metal and Direct3d so...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants