Skip to content

Conversation

slaren
Copy link
Member

@slaren slaren commented Oct 3, 2023

Initial version of the common backends interface.

Supports CPU and CUDA with full offloading, but not partial offloading or fallback to the CPU backend for unimplemented ops.

Modifies the gpt-2 example to use the backends interface. If built with CUDA support and executed with -ngl 1, the CUDA backend is used.

Support for Metal added in #552

@ggerganov
Copy link
Member

many of the changes in this PR are due to a sync with llama.cpp

I will do a sync of ggml with llama.cpp later tonight so we can focus on the relevant changes

@ggerganov
Copy link
Member

@slaren Let's rebase on master so we can see the relevant diff

Copy link
Contributor

@iboB iboB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing something similar as the comment in ggml-cuda.h for all other backends, should be the goal IMO.

Introducing ggml to existing codebases would be easiest from the inside out as a plugin: first doing some leaves of the computation, then more and more towards the root.

@slaren slaren marked this pull request as ready for review October 4, 2023 13:26
@ggerganov
Copy link
Member

Need to log off earlier today - will review tomorrow.
Is this branch in a state where I can attempt to add a Metal backend?

@slaren
Copy link
Member Author

slaren commented Oct 4, 2023

The branch should be stable, I think this is already good enough for a v1.

It should be easier to implement than the last time. There is a pointer to the buffer in the tensors, and that should help with backends that cannot abstract a buffer+offset into a pointer, like Metal. Views inherit this buffer from the parent automatically. But there are very likely going to be some rough edges. So I would say to not do it yet, unless you are willing to look into the internals and change anything that may be required.

v2 will introduce partial offloading. Fallback to CPU will come at the same time, or a little later. After fallback to CPU is implemented, I will implement support for the OpenCL backend. I think that Metal is closer to OpenCL than to CUDA, so it may be in a better state to implement Metal support after that point.

ggerganov and others added 3 commits October 5, 2023 15:50
* ggml-backend : code style suggestions

* ggml-backend : move ggml_backend and ggml_backend_buffer in the source file

* ggml-backend : move structs back to header + rename type

* ggml-backend : remove obsolete comment

* fix leak in ggml_backend_buffer_free

* ggml-backend : re-introduce typedefs as a declaration of intent

---------

Co-authored-by: slaren <slarengh@gmail.com>
* ggml-backend : metal (WIP)

* ggml-backend : metal (adapt CPU backend)

* ggml-backend : working metal

* ggml-backend : clean-up metal implementation

* ggml-backend : add ggml_backend_is_metal()
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this - I think it is a good addition to the framework

@slaren slaren merged commit fc9e955 into master Oct 6, 2023
@slaren slaren deleted the ggml-backend branch October 6, 2023 16:51
CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this pull request Dec 18, 2023
…ions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (ggml-org#547)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants