Skip to content

Conversation

davidkoski
Copy link
Collaborator

@davidkoski davidkoski commented May 1, 2025

CI failing on main but not on branch merged. Also doesn't fail locally. Debugging

OK, clearly I am missing something with CI -- it does show as a failure when I go back to look at it:

image

The problem seems to be compiler/tools side -- I investigated the output of the saveToData and it was the correct format and worked when I injected it on my machine. It seemed to be something in the compile of this code:

  uint64_t jsonHeaderLength = 0;
  // This is the same limit as in the original Rust Safetensors code.
  constexpr uint64_t kMaxJsonHeaderLength = 100000000;
  in_stream->read(reinterpret_cast<char*>(&jsonHeaderLength), 8);
  if (jsonHeaderLength <= 0 || jsonHeaderLength >= kMaxJsonHeaderLength) {
    throw std::runtime_error(
        "[load_safetensors] Invalid json header length " + in_stream->label());
  }

anyway, a newer Xcode/compiler fixed it, so unless it is needed on older systems it is now marked as only available in a newer compiler.

@davidkoski davidkoski requested a review from awni May 1, 2025 17:40
@awni
Copy link
Member

awni commented May 1, 2025

That's an odd bug. Do you expect it's related to Swift? Otherwise we should investigate it on the C++ side 🤔

@davidkoski
Copy link
Collaborator Author

That's an odd bug. Do you expect it's related to Swift? Otherwise we should investigate it on the C++ side 🤔

It looks like it is entirely on the C++ side, but it is a bit hard to say for sure debugging through CI runs. Curiously it doesn't happen when loading files from disk (and that seems to go through the same path).

I printed the base64 of the file contents and manually examined and loaded them -- they were fine!

@davidkoski
Copy link
Collaborator Author

I think it would require an older os/xcode to really debug it. Per circleCI here is what runs:

Xcode 15.3 (15E204a)
macOS 14.3.1

The former is probably the key part

@davidkoski
Copy link
Collaborator Author

Switching to Xcode 16.0 "fixes" the issue. Same base64 data, but with that it doesn't hit an error -- that is why I suspect something in the toolchain.

@@ -219,7 +219,7 @@ private func new_mlx_io_vtable_dataIO() -> mlx_io_vtable {
let state = Unmanaged<IOState>.fromOpaque(ptr!).takeUnretainedValue()

if n + state.offset <= state.data.count {
let _ = state.data.withContiguousStorageIfAvailable { buffer in
_ = state.data.withUnsafeBytes { buffer in
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it comes down to this -- for whatever reason the other method doesn't enter the closure with the older Xcode/swift.

There are no other uses of withContiguousStorageIfAvailable

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@davidkoski davidkoski merged commit bef30f0 into main May 1, 2025
1 check passed
@davidkoski davidkoski deleted the save-data branch May 1, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants