-
Notifications
You must be signed in to change notification settings - Fork 1.3k
ggml backends interface v1 #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I will do a sync of |
@slaren Let's rebase on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing something similar as the comment in ggml-cuda.h
for all other backends, should be the goal IMO.
Introducing ggml to existing codebases would be easiest from the inside out as a plugin: first doing some leaves of the computation, then more and more towards the root.
Need to log off earlier today - will review tomorrow. |
The branch should be stable, I think this is already good enough for a v1. It should be easier to implement than the last time. There is a pointer to the buffer in the tensors, and that should help with backends that cannot abstract a buffer+offset into a pointer, like Metal. Views inherit this buffer from the parent automatically. But there are very likely going to be some rough edges. So I would say to not do it yet, unless you are willing to look into the internals and change anything that may be required. v2 will introduce partial offloading. Fallback to CPU will come at the same time, or a little later. After fallback to CPU is implemented, I will implement support for the OpenCL backend. I think that Metal is closer to OpenCL than to CUDA, so it may be in a better state to implement Metal support after that point. |
* ggml-backend : code style suggestions * ggml-backend : move ggml_backend and ggml_backend_buffer in the source file * ggml-backend : move structs back to header + rename type * ggml-backend : remove obsolete comment * fix leak in ggml_backend_buffer_free * ggml-backend : re-introduce typedefs as a declaration of intent --------- Co-authored-by: slaren <slarengh@gmail.com>
* ggml-backend : metal (WIP) * ggml-backend : metal (adapt CPU backend) * ggml-backend : working metal * ggml-backend : clean-up metal implementation * ggml-backend : add ggml_backend_is_metal()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this - I think it is a good addition to the framework
…ions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (ggml-org#547)
Initial version of the common backends interface.
Supports CPU and CUDA with full offloading, but not partial offloading or fallback to the CPU backend for unimplemented ops.
Modifies the gpt-2 example to use the backends interface. If built with CUDA support and executed with
-ngl 1
, the CUDA backend is used.Support for Metal added in #552