-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Description
Hi - few weeks ago I opened an issue on CPU bottleneck, finally found out the root cause. It wasn't the CPU bottleneck really - it was the CPU managing frantically the mmap over network volume bottleneck.
For network storage, the code in comfy/utils.py line 13
sd = safetensors.torch.load_file(ckpt, device=device.type)
uses mmap and on network volumes this is hugely inefficient - it's about a 30-50x slowdown. A single SDXL safetensors takes 1-2 seconds with the following over network volume, but 40-50s in the vanilla way above.
I hacked together this:
try:
sd = safetensors.torch.load(open(ckpt, 'rb').read())
except:
sd = safetensors.torch.load_file(ckpt, device=device.type)
so it worked on my SDXL safetensors, while also falling back to the normal for certain controlnet checkpoints.
This issue has been referenced already in #1992 (comment)
I think a way to disable mmap in the first way is necessary otherwise models are extremely inefficient to load on any cloud provider platform that runs on K8s with network PVCs.