ggml backends interface v1 #547

slaren · 2023-10-03T00:21:57Z

Initial version of the common backends interface.

Supports CPU and CUDA with full offloading, but not partial offloading or fallback to the CPU backend for unimplemented ops.

Modifies the gpt-2 example to use the backends interface. If built with CUDA support and executed with -ngl 1, the CUDA backend is used.

Support for Metal added in #552

ggerganov · 2023-10-03T15:27:03Z

many of the changes in this PR are due to a sync with llama.cpp

I will do a sync of ggml with llama.cpp later tonight so we can focus on the relevant changes

ggerganov · 2023-10-04T12:53:46Z

@slaren Let's rebase on master so we can see the relevant diff

iboB

Doing something similar as the comment in ggml-cuda.h for all other backends, should be the goal IMO.

Introducing ggml to existing codebases would be easiest from the inside out as a plugin: first doing some leaves of the computation, then more and more towards the root.

src/ggml-cuda.h

ggerganov · 2023-10-04T14:32:58Z

Need to log off earlier today - will review tomorrow.
Is this branch in a state where I can attempt to add a Metal backend?

slaren · 2023-10-04T14:53:18Z

The branch should be stable, I think this is already good enough for a v1.

It should be easier to implement than the last time. There is a pointer to the buffer in the tensors, and that should help with backends that cannot abstract a buffer+offset into a pointer, like Metal. Views inherit this buffer from the parent automatically. But there are very likely going to be some rough edges. So I would say to not do it yet, unless you are willing to look into the internals and change anything that may be required.

v2 will introduce partial offloading. Fallback to CPU will come at the same time, or a little later. After fallback to CPU is implemented, I will implement support for the OpenCL backend. I think that Metal is closer to OpenCL than to CUDA, so it may be in a better state to implement Metal support after that point.

examples/gpt-2/main.cpp

* ggml-backend : code style suggestions * ggml-backend : move ggml_backend and ggml_backend_buffer in the source file * ggml-backend : move structs back to header + rename type * ggml-backend : remove obsolete comment * fix leak in ggml_backend_buffer_free * ggml-backend : re-introduce typedefs as a declaration of intent --------- Co-authored-by: slaren <slarengh@gmail.com>

examples/gpt-2/main.cpp

* ggml-backend : metal (WIP) * ggml-backend : metal (adapt CPU backend) * ggml-backend : working metal * ggml-backend : clean-up metal implementation * ggml-backend : add ggml_backend_is_metal()

ggerganov

Let's merge this - I think it is a good addition to the framework

examples/gpt-2/main.cpp

src/ggml-cuda.cu

…ions correctly with `vocab_only` setting. Also confirmed that the code works as expected after running with reduced memory usage due to deletion of no-longer-needed variable. (ggml-org#547)

ggerganov mentioned this pull request Oct 4, 2023

sync : llama.cpp (training, refactoring) #548

Merged

iboB suggested changes Oct 4, 2023

View reviewed changes

src/ggml-cuda.h Show resolved Hide resolved

slaren added 6 commits October 4, 2023 15:09

ggml backends interface v1

883f0bc

move get_alignment from buffer to backend

c05714f

ggml-cuda : fix ggml_cuda_op_mul_mat_vec_q

da82697

gpt-2 : better check for CPU backend when settings n_threads

3cf87a3

.gitignore : add .clangd

319b4bc

merge master

b527d48

slaren force-pushed the ggml-backend branch from d7d2a6b to b527d48 Compare October 4, 2023 13:21

restore tests/CMakeLists.txt

abf2669

slaren marked this pull request as ready for review October 4, 2023 13:26

ggerganov reviewed Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

YavorGIvanov reviewed Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

YavorGIvanov reviewed Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

ggerganov approved these changes Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

ggerganov and others added 3 commits October 5, 2023 15:50

gpt-2 : add comments about KV allocation

c4cd2d7

add ggml_backend_is_cpu

d8b3efc

slaren commented Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

slaren commented Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Show resolved Hide resolved

add backend check to ggml_backend_cpu_set_n_threads

3dbc43a

YavorGIvanov reviewed Oct 5, 2023

View reviewed changes

examples/gpt-2/main.cpp Show resolved Hide resolved

slaren added 4 commits October 5, 2023 16:13

backend cpu: fix buffer alignment

b74ffd5

fix CUDA_ARCHITECTURES for mmq

b4ec978

ggml-alloc : better handle view initialization

25ce18a

ggml-cuda : fix padding clearing

b42e19c

ggerganov added 4 commits October 6, 2023 10:51

ggml-backend : metal (#552)

94b0529

* ggml-backend : metal (WIP) * ggml-backend : metal (adapt CPU backend) * ggml-backend : working metal * ggml-backend : clean-up metal implementation * ggml-backend : add ggml_backend_is_metal()

gpt-2 : take advantage of Metal unified memory

ce797df

gpt-2 : remove TODO + update comment

e8bc940

ggml-backend : fix ggml_backend_is_xxx() interface

5ca14ce

ggerganov approved these changes Oct 6, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

slaren commented Oct 6, 2023

View reviewed changes

examples/gpt-2/main.cpp Outdated Show resolved Hide resolved

gpt-2 : fix build

b22916c

slaren commented Oct 6, 2023

View reviewed changes

src/ggml-cuda.cu Outdated Show resolved Hide resolved

ggml-cuda : cleanup, fix case for src1 not contiguous

01710cc

ggerganov mentioned this pull request Oct 6, 2023

make ggml_conv_2d faster #483

Merged

slaren added 2 commits October 6, 2023 18:24

remove commented code

9cb2626

rename ggml_backend_metal_set_n_threads to n_cb

1ad7c5e

slaren merged commit fc9e955 into master Oct 6, 2023

slaren deleted the ggml-backend branch October 6, 2023 16:51

ggerganov mentioned this pull request Oct 8, 2023

ggml : adapt Metal to new ggml_backend interface ggml-org/llama.cpp#2258

Closed

ggerganov mentioned this pull request Nov 3, 2023

sync : ggml (backend v2) ggml-org/llama.cpp#3912

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml backends interface v1 #547

ggml backends interface v1 #547

Uh oh!

slaren commented Oct 3, 2023 •

edited

Loading

Uh oh!

ggerganov commented Oct 3, 2023

Uh oh!

ggerganov commented Oct 4, 2023

Uh oh!

iboB left a comment

Uh oh!

Uh oh!

ggerganov commented Oct 4, 2023

Uh oh!

slaren commented Oct 4, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggml backends interface v1 #547

ggml backends interface v1 #547

Uh oh!

Conversation

slaren commented Oct 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Oct 3, 2023

Uh oh!

ggerganov commented Oct 4, 2023

Uh oh!

iboB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Oct 4, 2023

Uh oh!

slaren commented Oct 4, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slaren commented Oct 3, 2023 •

edited

Loading

slaren commented Oct 4, 2023 •

edited

Loading