-
Notifications
You must be signed in to change notification settings - Fork 349
Closed
Labels
Milestone
Description
Description
Ability to query the number of SMs in the device
Context
In CUDA we can get it using cudaGetDeviceProperties ( cudaDeviceProp* prop, int device )
and then accessing prop.multiProcessorCount
.
This feature can be helpful in right-sizing the grid. Sometimes we would like to avoid tail effects, which can be created by distributing the work to 11 blocks on a 10-SM GPU (for example). The ability to query the number of SMs can help us avoid such tail effects.