-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Hicache storage connector #7920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hicache storage connector #7920
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @DellCurry, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly extends the capabilities of the SGLang hierarchical cache (Hicache) by integrating a robust storage connector framework. The primary goal is to enable persistent storage and intelligent prefetching of KV cache data, allowing for more efficient memory utilization and faster model loading or context switching. It introduces a new file-based storage option and refines the core connector interfaces to support a broader range of storage backends, all while managing asynchronous data transfers between host memory and external storage.
Highlights
- New Hierarchical Cache (Hicache) Storage Layer: Introduced a new storage backend capability for the hierarchical KV cache. This allows KV cache pages to be persisted to and prefetched from external storage systems, enhancing memory management and potentially improving cold start times.
- New File System Connector: Added a
FileConnector
that enables storing and retrieving KV cache pages directly from the local filesystem, saving them as.bin
files. This expands the available storage options alongside existing Redis and S3 connectors. - Refactored Connector Architecture: The connector hierarchy has been refactored with the introduction of
BaseWeightConnector
and the renaming ofBaseFileConnector
toBaseFileSystemConnector
. TheConnectorType
enum now uses bit flags, allowing for more flexible and robust type checking across different storage and weight handling mechanisms. - Asynchronous Prefetching and Backup: Implemented dedicated asynchronous threads within the
HiCacheController
for managing prefetching of KV cache pages from storage to host memory and backing up pages from host memory to storage. This ensures non-blocking I/O operations for cache management. - KV Cache Page Hashing and Protection: A new hashing mechanism (
get_hash_str
) has been introduced to uniquely identify KV cache pages for storage. Additionally, ahost_ref_counter
was added toTreeNode
to protect KV cache pages from eviction while they are involved in ongoing storage operations. - Scheduler Integration for Prefetching: The scheduler now actively participates in the prefetching process, initiating prefetch operations based on incoming request tokens and monitoring their progress to ensure efficient cache utilization.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a storage connector for Hicache, enabling the offloading and prefetching of the KV cache to and from persistent storage like local files, Redis, or S3. The changes are extensive, including new base connector classes, a file-based connector implementation, and significant logic in HiCacheController
for managing prefetch and backup operations in background threads. The refactoring of connector types using bitmasks is a good improvement for flexibility.
My review has identified a few critical correctness issues, such as a syntax error in an __all__
list and an unimplemented abstract method, which would cause runtime errors. I have also provided feedback on improving the robustness of the code, particularly around error handling in background threads and the use of bitmasks for checking connector capabilities.
"RedisConnector", | ||
"S3Connector", | ||
"ConnectorType", | ||
"BaseWeightConnector" "ConnectorType", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a syntax error in the __all__
list. The comma is missing between "BaseWeightConnector"
and "ConnectorType"
. This will cause Python to concatenate the two adjacent strings into a single string "BaseWeightConnectorConnectorType"
, which is not a valid name to export and will break imports from this module.
"BaseWeightConnector" "ConnectorType", | |
"BaseWeightConnector", "ConnectorType", |
def list(self, prefix: str) -> List[str]: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list
method in FileConnector
is not implemented. Since FileConnector
inherits from BaseKVConnector
, which defines list
as an abstract method, this will raise a TypeError
when FileConnector
is instantiated.
You should implement this method to list keys from the file-based storage that match the given prefix.
def list(self, prefix: str) -> List[str]: | |
pass | |
def list(self, prefix: str) -> List[str]: | |
if not os.path.isdir(self.file_path): | |
return [] | |
return [ | |
filename[:-4] | |
for filename in os.listdir(self.file_path) | |
if filename.startswith(prefix) and filename.endswith(".bin") | |
] |
# todo, handle failures in storage backend | ||
self.storage_backend.set( | ||
last_hash, | ||
self.mem_pool_host.get_flat_data_page( | ||
operation.host_indices[i] | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backup_thread_func
does not handle potential failures from self.storage_backend.set()
. The todo
comment at line 620 correctly identifies this. If a storage operation fails (e.g., due to network issues, disk full, or permissions), it could either raise an exception and crash the backup thread, or fail silently depending on the connector's implementation. This could leave the system in an inconsistent state, as the cache controller would not know that the backup failed, potentially leading to data loss.
To improve robustness, the set
methods in the connectors should consistently report errors (e.g., by raising an exception). The backup_thread_func
should then wrap the set
call in a try...except
block to catch these errors, log them, and implement a proper error handling strategy (e.g., retry, or mark the operation as failed and notify the system).
@@ -1339,7 +1339,7 @@ def _get_weights_iterator_kv( | |||
client, | |||
) -> Generator[Tuple[str, torch.Tensor], None, None]: | |||
"""Get an iterator for the model weights from remote storage.""" | |||
assert get_connector_type(client) == ConnectorType.KV | |||
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the equality operator ==
to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.
For better robustness and consistency with the pattern used in load_model
, you should use the bitwise AND operator &
to verify that the connector has the required capabilities.
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT | |
assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT |
@@ -1348,7 +1348,7 @@ def _get_weights_iterator_fs( | |||
client, | |||
) -> Generator[Tuple[str, torch.Tensor], None, None]: | |||
"""Get an iterator for the model weights from remote storage.""" | |||
assert get_connector_type(client) == ConnectorType.FS | |||
assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the equality operator ==
to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.
For better robustness and consistency with the pattern used in load_model
, you should use the bitwise AND operator &
to verify that the connector has the required capabilities.
assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT | |
assert get_connector_type(client) & ConnectorType.FS and get_connector_type(client) & ConnectorType.WEIGHT |
@@ -1361,7 +1361,7 @@ def save_model( | |||
url: str, | |||
) -> None: | |||
with create_remote_connector(url) as client: | |||
assert get_connector_type(client) == ConnectorType.KV | |||
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the equality operator ==
to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.
For better robustness and consistency with the pattern used in load_model
, you should use the bitwise AND operator &
to verify that the connector has the required capabilities.
assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT | |
assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT |
…ding Signed-off-by: wangyu <wangyu.steph@bytedance.com>
711caec
to
37997cd
Compare
Motivation
Modifications
Checklist