Skip to content

Conversation

DellCurry
Copy link
Contributor

Motivation

Modifications

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @DellCurry, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the capabilities of the SGLang hierarchical cache (Hicache) by integrating a robust storage connector framework. The primary goal is to enable persistent storage and intelligent prefetching of KV cache data, allowing for more efficient memory utilization and faster model loading or context switching. It introduces a new file-based storage option and refines the core connector interfaces to support a broader range of storage backends, all while managing asynchronous data transfers between host memory and external storage.

Highlights

  • New Hierarchical Cache (Hicache) Storage Layer: Introduced a new storage backend capability for the hierarchical KV cache. This allows KV cache pages to be persisted to and prefetched from external storage systems, enhancing memory management and potentially improving cold start times.
  • New File System Connector: Added a FileConnector that enables storing and retrieving KV cache pages directly from the local filesystem, saving them as .bin files. This expands the available storage options alongside existing Redis and S3 connectors.
  • Refactored Connector Architecture: The connector hierarchy has been refactored with the introduction of BaseWeightConnector and the renaming of BaseFileConnector to BaseFileSystemConnector. The ConnectorType enum now uses bit flags, allowing for more flexible and robust type checking across different storage and weight handling mechanisms.
  • Asynchronous Prefetching and Backup: Implemented dedicated asynchronous threads within the HiCacheController for managing prefetching of KV cache pages from storage to host memory and backing up pages from host memory to storage. This ensures non-blocking I/O operations for cache management.
  • KV Cache Page Hashing and Protection: A new hashing mechanism (get_hash_str) has been introduced to uniquely identify KV cache pages for storage. Additionally, a host_ref_counter was added to TreeNode to protect KV cache pages from eviction while they are involved in ongoing storage operations.
  • Scheduler Integration for Prefetching: The scheduler now actively participates in the prefetching process, initiating prefetch operations based on incoming request tokens and monitoring their progress to ensure efficient cache utilization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a storage connector for Hicache, enabling the offloading and prefetching of the KV cache to and from persistent storage like local files, Redis, or S3. The changes are extensive, including new base connector classes, a file-based connector implementation, and significant logic in HiCacheController for managing prefetch and backup operations in background threads. The refactoring of connector types using bitmasks is a good improvement for flexibility.

My review has identified a few critical correctness issues, such as a syntax error in an __all__ list and an unimplemented abstract method, which would cause runtime errors. I have also provided feedback on improving the robustness of the code, particularly around error handling in background threads and the use of bitmasks for checking connector capabilities.

"RedisConnector",
"S3Connector",
"ConnectorType",
"BaseWeightConnector" "ConnectorType",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a syntax error in the __all__ list. The comma is missing between "BaseWeightConnector" and "ConnectorType". This will cause Python to concatenate the two adjacent strings into a single string "BaseWeightConnectorConnectorType", which is not a valid name to export and will break imports from this module.

Suggested change
"BaseWeightConnector" "ConnectorType",
"BaseWeightConnector", "ConnectorType",

Comment on lines +56 to +74
def list(self, prefix: str) -> List[str]:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The list method in FileConnector is not implemented. Since FileConnector inherits from BaseKVConnector, which defines list as an abstract method, this will raise a TypeError when FileConnector is instantiated.

You should implement this method to list keys from the file-based storage that match the given prefix.

Suggested change
def list(self, prefix: str) -> List[str]:
pass
def list(self, prefix: str) -> List[str]:
if not os.path.isdir(self.file_path):
return []
return [
filename[:-4]
for filename in os.listdir(self.file_path)
if filename.startswith(prefix) and filename.endswith(".bin")
]

Comment on lines 620 to 634
# todo, handle failures in storage backend
self.storage_backend.set(
last_hash,
self.mem_pool_host.get_flat_data_page(
operation.host_indices[i]
),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The backup_thread_func does not handle potential failures from self.storage_backend.set(). The todo comment at line 620 correctly identifies this. If a storage operation fails (e.g., due to network issues, disk full, or permissions), it could either raise an exception and crash the backup thread, or fail silently depending on the connector's implementation. This could leave the system in an inconsistent state, as the cache controller would not know that the backup failed, potentially leading to data loss.

To improve robustness, the set methods in the connectors should consistently report errors (e.g., by raising an exception). The backup_thread_func should then wrap the set call in a try...except block to catch these errors, log them, and implement a proper error handling strategy (e.g., retry, or mark the operation as failed and notify the system).

@@ -1339,7 +1339,7 @@ def _get_weights_iterator_kv(
client,
) -> Generator[Tuple[str, torch.Tensor], None, None]:
"""Get an iterator for the model weights from remote storage."""
assert get_connector_type(client) == ConnectorType.KV
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT
assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT

@@ -1348,7 +1348,7 @@ def _get_weights_iterator_fs(
client,
) -> Generator[Tuple[str, torch.Tensor], None, None]:
"""Get an iterator for the model weights from remote storage."""
assert get_connector_type(client) == ConnectorType.FS
assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change
assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT
assert get_connector_type(client) & ConnectorType.FS and get_connector_type(client) & ConnectorType.WEIGHT

@@ -1361,7 +1361,7 @@ def save_model(
url: str,
) -> None:
with create_remote_connector(url) as client:
assert get_connector_type(client) == ConnectorType.KV
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT
assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT

@xiezhq-hermann xiezhq-hermann changed the base branch from main to xiezhq-hicache-storage July 10, 2025 06:06
…ding

Signed-off-by: wangyu <wangyu.steph@bytedance.com>
@DellCurry DellCurry force-pushed the hicache-storage-connector branch from 711caec to 37997cd Compare July 14, 2025 09:21
@xiezhq-hermann xiezhq-hermann self-assigned this Jul 14, 2025
@ispobock ispobock deleted the branch sgl-project:xiezhq-hicache-storage July 18, 2025 07:20
@ispobock ispobock closed this Jul 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants