Hicache storage connector #7920

DellCurry · 2025-07-10T06:03:20Z

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @DellCurry, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the capabilities of the SGLang hierarchical cache (Hicache) by integrating a robust storage connector framework. The primary goal is to enable persistent storage and intelligent prefetching of KV cache data, allowing for more efficient memory utilization and faster model loading or context switching. It introduces a new file-based storage option and refines the core connector interfaces to support a broader range of storage backends, all while managing asynchronous data transfers between host memory and external storage.

Highlights

New Hierarchical Cache (Hicache) Storage Layer: Introduced a new storage backend capability for the hierarchical KV cache. This allows KV cache pages to be persisted to and prefetched from external storage systems, enhancing memory management and potentially improving cold start times.
New File System Connector: Added a FileConnector that enables storing and retrieving KV cache pages directly from the local filesystem, saving them as .bin files. This expands the available storage options alongside existing Redis and S3 connectors.
Refactored Connector Architecture: The connector hierarchy has been refactored with the introduction of BaseWeightConnector and the renaming of BaseFileConnector to BaseFileSystemConnector. The ConnectorType enum now uses bit flags, allowing for more flexible and robust type checking across different storage and weight handling mechanisms.
Asynchronous Prefetching and Backup: Implemented dedicated asynchronous threads within the HiCacheController for managing prefetching of KV cache pages from storage to host memory and backing up pages from host memory to storage. This ensures non-blocking I/O operations for cache management.
KV Cache Page Hashing and Protection: A new hashing mechanism (get_hash_str) has been introduced to uniquely identify KV cache pages for storage. Additionally, a host_ref_counter was added to TreeNode to protect KV cache pages from eviction while they are involved in ongoing storage operations.
Scheduler Integration for Prefetching: The scheduler now actively participates in the prefetching process, initiating prefetch operations based on incoming request tokens and monitoring their progress to ensure efficient cache utilization.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a storage connector for Hicache, enabling the offloading and prefetching of the KV cache to and from persistent storage like local files, Redis, or S3. The changes are extensive, including new base connector classes, a file-based connector implementation, and significant logic in HiCacheController for managing prefetch and backup operations in background threads. The refactoring of connector types using bitmasks is a good improvement for flexibility.

My review has identified a few critical correctness issues, such as a syntax error in an __all__ list and an unimplemented abstract method, which would cause runtime errors. I have also provided feedback on improving the robustness of the code, particularly around error handling in background threads and the use of bitmasks for checking connector capabilities.

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/connector/__init__.py

-    "RedisConnector",
-    "S3Connector",
-    "ConnectorType",
+    "BaseWeightConnector" "ConnectorType",


There's a syntax error in the __all__ list. The comma is missing between "BaseWeightConnector" and "ConnectorType". This will cause Python to concatenate the two adjacent strings into a single string "BaseWeightConnectorConnectorType", which is not a valid name to export and will break imports from this module.

Suggested change

"BaseWeightConnector" "ConnectorType",

"BaseWeightConnector", "ConnectorType",

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/connector/file.py

+    def list(self, prefix: str) -> List[str]:
+        pass


The list method in FileConnector is not implemented. Since FileConnector inherits from BaseKVConnector, which defines list as an abstract method, this will raise a TypeError when FileConnector is instantiated.

You should implement this method to list keys from the file-based storage that match the given prefix.

Suggested change

def list(self, prefix: str) -> List[str]:

pass

def list(self, prefix: str) -> List[str]:

if not os.path.isdir(self.file_path):

return []

return [

filename[:-4]

for filename in os.listdir(self.file_path)

if filename.startswith(prefix) and filename.endswith(".bin")

]

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/managers/cache_controller.py

+                    # todo, handle failures in storage backend
+                    self.storage_backend.set(
+                        last_hash,
+                        self.mem_pool_host.get_flat_data_page(
+                            operation.host_indices[i]
+                        ),
+                    )


The backup_thread_func does not handle potential failures from self.storage_backend.set(). The todo comment at line 620 correctly identifies this. If a storage operation fails (e.g., due to network issues, disk full, or permissions), it could either raise an exception and crash the backup thread, or fail silently depending on the connector's implementation. This could leave the system in an inconsistent state, as the cache controller would not know that the backup failed, potentially leading to data loss.

To improve robustness, the set methods in the connectors should consistently report errors (e.g., by raising an exception). The backup_thread_func should then wrap the set call in a try...except block to catch these errors, log them, and implement a proper error handling strategy (e.g., retry, or mark the operation as failed and notify the system).

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/model_loader/loader.py

@@ -1339,7 +1339,7 @@ def _get_weights_iterator_kv(
        client,
    ) -> Generator[Tuple[str, torch.Tensor], None, None]:
        """Get an iterator for the model weights from remote storage."""
-        assert get_connector_type(client) == ConnectorType.KV
+        assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT


Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change

assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT

assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/model_loader/loader.py

@@ -1348,7 +1348,7 @@ def _get_weights_iterator_fs(
        client,
    ) -> Generator[Tuple[str, torch.Tensor], None, None]:
        """Get an iterator for the model weights from remote storage."""
-        assert get_connector_type(client) == ConnectorType.FS
+        assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT


Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change

assert get_connector_type(client) == ConnectorType.FS | ConnectorType.WEIGHT

assert get_connector_type(client) & ConnectorType.FS and get_connector_type(client) & ConnectorType.WEIGHT

gemini-code-assist · 2025-07-10T06:06:06Z

python/sglang/srt/model_loader/loader.py

@@ -1361,7 +1361,7 @@ def save_model(
        url: str,
    ) -> None:
        with create_remote_connector(url) as client:
-            assert get_connector_type(client) == ConnectorType.KV
+            assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT


Using the equality operator == to check for connector capabilities with a bitmask enum is brittle. It requires an exact match of all capabilities, and will fail if a connector implements additional capabilities in the future.

For better robustness and consistency with the pattern used in load_model, you should use the bitwise AND operator & to verify that the connector has the required capabilities.

Suggested change

assert get_connector_type(client) == ConnectorType.KV | ConnectorType.WEIGHT

assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT

…ding Signed-off-by: wangyu <wangyu.steph@bytedance.com>

gemini-code-assist bot reviewed Jul 10, 2025

View reviewed changes

xiezhq-hermann changed the base branch from main to xiezhq-hicache-storage July 10, 2025 06:06

xiezhq-hermann mentioned this pull request Jul 11, 2025

Hicache Storage Layer Prototype #7704

Merged

6 tasks

refactor(connector): support both remote KV cache & remote weight loa…

37997cd

…ding Signed-off-by: wangyu <wangyu.steph@bytedance.com>

DellCurry force-pushed the hicache-storage-connector branch from 711caec to 37997cd Compare July 14, 2025 09:21

xiezhq-hermann self-assigned this Jul 14, 2025

ispobock deleted the branch sgl-project:xiezhq-hicache-storage July 18, 2025 07:20

ispobock closed this Jul 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hicache storage connector #7920

Hicache storage connector #7920

Uh oh!

DellCurry commented Jul 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

gemini-code-assist bot Jul 10, 2025

Uh oh!

Uh oh!

	"BaseWeightConnector" "ConnectorType",
	"BaseWeightConnector", "ConnectorType",

-    def list(self, prefix: str) -> List[str]:
-        pass
+    def list(self, prefix: str) -> List[str]:
+        if not os.path.isdir(self.file_path):
+            return []
+        return [
+            filename[:-4]
+            for filename in os.listdir(self.file_path)
+            if filename.startswith(prefix) and filename.endswith(".bin")
+        ]

	assert get_connector_type(client) == ConnectorType.KV \| ConnectorType.WEIGHT
	assert get_connector_type(client) & ConnectorType.KV and get_connector_type(client) & ConnectorType.WEIGHT

Hicache storage connector #7920

Hicache storage connector #7920

Uh oh!

Conversation

DellCurry commented Jul 10, 2025

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!