-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Gpu add memory barriers #6021
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gpu add memory barriers #6021
Conversation
📝 WalkthroughWalkthroughThe changes introduce synchronization barriers into GPU command flows. A new method, Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Basic GPU Test
participant Context as GPU Context
participant Vulkan as Vulkan Device
participant Buffer as Storage Buffer
Test->>Context: Dispatch GPU computation
Test->>Context: Invoke barrier_buffers([storage_buffer])
Context->>Vulkan: Execute cmd_pipeline_barrier for storage_buffer
Test->>Context: Continue with context.run()
sequenceDiagram
participant Insert as GPU Insert Context
participant Context as GPU Context
participant Vulkan as Vulkan Device
participant Buffers as GPU Buffers
Insert->>Context: Dispatch GPU computation (greedy_search/run_insert_vector)
Insert->>Context: Invoke barrier_buffers([...buffers])
Context->>Vulkan: Execute cmd_pipeline_barrier for provided buffers
Insert->>Context: Continue with GPU operations
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
lib/gpu/src/context.rs (3)
127-131
: Avoid potential duplication inself.resources
.
Pushing each buffer intoself.resources
might re-insert the same buffer multiple times ifbarrier_buffers
is called repeatedly with the same resource. While not incorrect, consider checking whether the buffer is already tracked to avoid duplicates.
150-160
: Consider narrowing the pipeline stage flags for performance.
UsingALL_COMMANDS
as the destination stage is safe but can be unnecessarily broad. If you know the next pipeline stage specifically (e.g., only transfer or compute stages), refiningdst_stage_mask
can improve performance.- vk::PipelineStageFlags::ALL_COMMANDS, + vk::PipelineStageFlags::TRANSFER | vk::PipelineStageFlags::COMPUTE_SHADER,
115-163
: Add dedicated tests forbarrier_buffers
.
While this method can rely on downstreamwait_finish
calls, explicit tests verifying correct buffer synchronization across multiple commands or queue operations would help ensure correctness and detect regressions.Do you want me to open a follow-up PR or generate a test snippet for buffer synchronization scenarios?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
lib/gpu/src/basic_test.rs
(1 hunks)lib/gpu/src/context.rs
(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs
(2 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_links.rs
(1 hunks)lib/segment/src/index/hnsw_index/gpu/gpu_visited_flags.rs
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (13)
- GitHub Check: Basic TLS/HTTPS tests
- GitHub Check: test-snapshot-operations-s3-minio
- GitHub Check: test-shard-snapshot-api-s3-minio
- GitHub Check: test-low-resources
- GitHub Check: test-consistency
- GitHub Check: test-consensus-compose
- GitHub Check: test (macos-latest)
- GitHub Check: test
- GitHub Check: test (windows-latest)
- GitHub Check: test
- GitHub Check: test-consensus
- GitHub Check: test
- GitHub Check: test (ubuntu-latest)
🔇 Additional comments (7)
lib/segment/src/index/hnsw_index/gpu/gpu_links.rs (1)
166-168
: New public method to expose the GPU links buffer
This getter correctly returns a clone of theArc<gpu::Buffer>
, allowing safe external access with no apparent pitfalls.lib/segment/src/index/hnsw_index/gpu/gpu_visited_flags.rs (1)
141-143
: New public method to expose the visited flags buffer
This addition mirrors the approach used inGpuLinks
, returning anArc<gpu::Buffer>
for consistent external access.lib/gpu/src/basic_test.rs (1)
140-141
: Barrier ensures buffer synchronization
The barrier step before proceeding to the next command provides the requisite memory ordering guarantees for subsequent GPU operations.lib/segment/src/index/hnsw_index/gpu/gpu_insert_context.rs (2)
326-328
: Added barrier for responses buffer in greedy search
Placing a barrier after dispatch ensures completed writes toresponses_buffer
before any subsequent operations.
394-401
: Barrier for multiple buffers in insert vector flow
Synchronizingresponses_buffer
,insert_atomics_buffer
,links_buffer
, andvisited_flags_buffer
comprehensively ensures data consistency across GPU operations.lib/gpu/src/context.rs (2)
115-122
: Good addition of buffer synchronization and documentation.
The doc comments clearly explain the reasoning behind addingbarrier_buffers
. This method both complements and provides faster in-graph synchronization than a fullwait_finish()
.
141-146
: Verify correctness of source/destination access masks.
At present,src_access_mask
is set toSHADER_WRITE
, anddst_access_mask
includes both transfer and shader flags. This covers a wide range of usage scenarios but may be too permissive if the buffers are only read afterwards. Consider refining them or confirming all potential subsequent usage paths.Would you like to cross-check usage in other modules to confirm correctness?
* gpu add memory barriers * add comments
This PR adds a vulkan memory barrier. Memory barrier is a sync primitive that adds guarantees about execution order. After this barrier, all changes must be available for downloading without race conditions or unflushed caches.