Skip to content

Conversation

usamoi
Copy link
Contributor

@usamoi usamoi commented Feb 6, 2025

issue: #117
first try: #150

@VoVAllen
Copy link
Member

VoVAllen commented Feb 6, 2025

Let's try it on 100M dataset and see how flamegraph goes

@usamoi
Copy link
Contributor Author

usamoi commented Feb 7, 2025

2c4t, on laion-100m:

  • estimated build time on main: ~60h
  • estimated build time on pinning-v2: ~50h

next steps:

  • merge two shared memory segments to one
  • make it optional

cc @VoVAllen

Signed-off-by: usamoi <usamoi@outlook.com>
@VoVAllen
Copy link
Member

VoVAllen commented Feb 7, 2025

Why do we need to merge two segments together?

The optimization result is pretty close to the flamegraph result.

@VoVAllen
Copy link
Member

VoVAllen commented Feb 7, 2025

And what is 2c4t?

@usamoi
Copy link
Contributor Author

usamoi commented Feb 8, 2025

And what is 2c4t?

4 vcpu

Why do we need to merge two segments together?

not necessary; helpful to make it optional

@usamoi usamoi force-pushed the pinning-v2 branch 2 times, most recently from bc95918 to dd76752 Compare February 8, 2025 06:12
Signed-off-by: usamoi <usamoi@outlook.com>
@usamoi
Copy link
Contributor Author

usamoi commented Feb 8, 2025

i4i.xlarge, laion-100m:

  • estimated build time on main: ~61h
  • estimated build time on pinning-v2: ~44h

@usamoi usamoi merged commit 6373f55 into tensorchord:main Feb 8, 2025
13 checks passed
@usamoi usamoi requested a review from Copilot March 12, 2025 03:59
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a "pinning index in memory" feature by introducing a new cache function and updating related build and index option types. Key changes include:

  • Adding a new cache function in crates/algorithm/src/cache.rs that builds an index trace.
  • Defining new indexing and build options in src/index/types.rs and refactoring related options in crates/algorithm/src/types.rs.
  • Adding a clone_into_boxed method in src/index/storage.rs to safely duplicate PostgresPage data using unsafe pointer copying.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/algorithm/src/cache.rs New cache function implementation that reads index data and builds a trace.
src/index/types.rs New type definitions for internal and external build options to support the pinned index feature.
src/index/storage.rs Adds a Debug derive and a clone_into_boxed method for PostgresPage to facilitate boxed cloning.
crates/algorithm/src/lib.rs Integrates the new cache module and updates module exports.
crates/algorithm/src/types.rs Refactors index options by removing build option types and using a simplified VchordrqIndexOptions struct.
crates/algorithm/src/build.rs Updates the build function parameter type to use the new VchordrqIndexOptions.
Comments suppressed due to low confidence (1)

src/index/storage.rs:46

  • Ensure that the unsafe pointer copy in clone_into_boxed is safe for all fields of PostgresPage, particularly if the structure evolves to include non-POD types.
pub fn clone_into_boxed(&self) -> Box<Self> {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants