-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[allocator] Introduce the abstract class CUDACachingAllocator #87251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87251
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 6 PendingAs of commit bd1088e: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. ghstack-source-id: f514834 Pull Request resolved: #87251
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. ghstack-source-id: 72d3337 Pull Request resolved: #87251
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly code-motion, i couldn't spot any typos or issues there.
Couple of nits about using_ flags renaming
perf-wise, I agree you're swapping an equivalent load/deref from a function-pointer to a get()->.
@@ -42,8 +42,7 @@ bool get_p2p_access(int dev, int dev_to_access) { | |||
#ifdef USE_ROCM | |||
bool using_cudaMallocAsync = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the #ifdef USE_ROCM
case also be renamed to bool needs_pool_spe... = False
?
aten/src/ATen/native/cuda/Copy.cu
Outdated
@@ -97,8 +97,7 @@ void copy_device_to_device(TensorIterator& iter, | |||
#ifdef USE_ROCM | |||
bool using_cudaMallocAsync = false; | |||
#else | |||
bool using_cudaMallocAsync = (CUDACachingAllocator::allocatorBackend() == | |||
CUDACachingAllocator::AllocatorBackend::CUDAMALLOCASYNC); | |||
bool using_cudaMallocAsync = CUDACachingAllocator::get()->needsPoolSpecificPeerAccess(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename for consistency?
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. ghstack-source-id: 4eb235e Pull Request resolved: #87251
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
…tor" This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. [ghstack-poisoned]
This replaces the manual function pointers, making it easier to write new drop-in allocators. Note that most allocation goes through the Allocator interface, which CUDAAllocator inherits from, and this arrangement avoids adding and additional layer of dispatch along this pathway compared to what existed before. ghstack-source-id: bfa71eb Pull Request resolved: #87251
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Hey @zdevito. |
Stack from ghstack (oldest at bottom):
This replaces the manual function pointers, making it easier to write
new drop-in allocators.
Note that most allocation goes through the Allocator interface, which
CUDAAllocator inherits from, and this arrangement avoids adding and
additional layer of dispatch along this pathway compared to what existed before.