Safetensors loading uses mmap with multiple processes sharing the same fd cause slow gcsfuse performance

### Describe the bug

When I use `StableDiffusionPipeline.from_single_file` to load a safetensors model, I noticed that the loading speed is extremely slow when the file is loaded from GCSFuse (https://cloud.google.com/storage/docs/cloud-storage-fuse/overview).

The reason is that the loader creates multiple processes but they all share the same fd and its file handle. As each process reads different offset of the file, it makes the GCSFuse perform really badly because those reads appear to be random read jumping between offsets. For example:

```
connection.go:420] <- ReadFile (inode 2, PID 77, handle 1, offset 529453056, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 78, handle 1, offset 531812352, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 79, handle 1, offset 534171648, 262144 bytes)
connection.go:420] <- ReadFile (inode 2, PID 50, handle 1, offset 527351808, 4096 bytes)
```

The question I have is why the loading multiple processes share the same fd in the first place? As `mmap` is already used, even the multiple processes don't share the same fd, the kernel will still map the virtual memory for each process back to the same the page cache naturally, so there is no need to share the fd across the fd.

If they don't share the fd, GCSFuse will perform much better. Therefore, can we disable the fd sharing?

### Reproduction

Simply using GCSFuse to serve a file to `StableDiffusionPipeline.from_single_file`

### Logs

_No response_

### System Info

N/A

### Who can help?

@yiyixuxu  @asomoza 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Safetensors loading uses mmap with multiple processes sharing the same fd cause slow gcsfuse performance #10280

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Safetensors loading uses mmap with multiple processes sharing the same fd cause slow gcsfuse performance #10280

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions