### Description <!-- Describe your request. --> ``` def _ray_all_geom( ... ) worldid, rayid, tid = wp.tid() num_threads_in_thread_block = wp.block_dim() ngeom = m.ngeom upper = ((ngeom + num_threads_in_thread_block - 1) // num_threads_in_thread_block) * num_threads_in_thread_block for geom_id in range(tid, upper, num_threads_in_thread_block): cur_dist = max_dist if geom_id < ngeom: ... compute cur_dist t = wp.tile(cur_dist) local_min_idx = wp.tile_argmin(t) local_min_val = t[local_min_idx[0]] ``` ### Context <!-- What is the motivation for this request? --> Improve support for strided loops when working with tiles to allow the same code to run on CPU and CUDA devices.