sync : llama.cpp #1006

ggerganov · 2024-11-04T08:51:43Z

TODO:

fix build and tests

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

… MobileVLM model. (llama/9763) * ggml: Add POOL2D OP for GPU ACC to the Vulkan. - The MobileVLM model now supports inference acceleration through GPU by utilizing the Vulkan backend. - A GGML_OP_POOL_2D shader has been added. (Pooling) - The encoding performance of the CLIP model improved from 2.8s on the CPU to 0.7s on the GPU. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> * [fix] Correct the incorrect order of the parameters. fix casting to int. Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com> --------- Signed-off-by: Changyeon Kim <cyzero.kim@samsung.com>

* ggml : RISC-V vector gemv for q4_0_8x8 * ggml : Added WIP rvv q4_0_8x8 gemm * ggml : Added initial implementation of rvv gemm * ggml : optimize gemm to avoid register spillover * ggml : Fix GCC rvv load alignment issue * ggml : Format gemm rvv code * ggml : Fix a typo in RVV q4_0_8_8 GEMM

* ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file

Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>

This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>

* llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE

ggml-ci

* llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

slaren · 2024-11-04T11:25:56Z

The test-opt should just be disabled until it is updated in #988, since the opt interface has been removed it cannot be updated.

Looks like other tests are failing too, I will update them.

slaren · 2024-11-04T11:39:28Z

I disabled all tests and examples that depend on ggml_opt. They should be re-enabled or removed in #988.

ggerganov and others added 18 commits November 4, 2024 10:50

scripts : update sync

deb78e6

musa: workaround for Guilty Lockup in cleaning src0 (llama/10042)

9dc22cc

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

llama : refactor model loader with backend registry (llama/10026)

3a22438

ggml : fix memory leaks when loading invalid gguf files (llama/10094)

cd1388d

* ggml : fix gguf string leak when reading kv pairs fails * ggml : avoid crashing with GGML_ABORT when the KV has an invalid type * ggml : avoid crashing on failed memory allocations when loading a gguf file

kompute: add backend registry / device interfaces (llama/10045)

4c7c45b

Get in line with the other backends by supporting the newer backend/device registry interfaces. Signed-off-by: Sergio Lopez <slp@redhat.com>

kompute: add mul_mat_q4_k shader (llama/10097)

43db1f2

This is a more or less direct translation from the Metal implementation to GLSL. Signed-off-by: Sergio Lopez <slp@redhat.com>

ggml : check tensor name lengths in gguf files (llama/10100)

7199975

llama : fix buffer checks for mamba and rwk (llama/10111)

343474c

* llama : fix buffer checks for mamba and rwk * llama : fix missing worst case flag during reserve * cuda : fix supports_op for norm * disable sched SET_CAUSE

build: fix build error in Windows env with OneAPI setup (llama/10107)

1a11888

ggml : remove ggml_scratch (llama/10121)

5e3ca1c

ggml-ci

vulkan : improve ggml_vk_create_buffer error handling (llama/9898)

f55d317

llama : use smart pointers for ggml resources (llama/10117)

f7ab752

llama : add simple-chat example (llama/10124)

b45e446

* llama : add simple-chat example --------- Co-authored-by: Xuan Son Nguyen <thichthat@gmail.com>

metal : minor fixup in FA kernel (llama/10143)

433a421

* metal : minor fixup in FA kernel ggml-ci * metal : use the unrolled loop variable * metal : remove unused var

ggml : move CPU backend to a separate file (llama/10144)

1785094

sync : llama.cpp

44548a9

ggerganov changed the title ~~sync : llam.cpp~~ sync : llama.cpp Nov 4, 2024

update tests and examples

ada9fbb

slaren force-pushed the sync branch from f087ecf to ada9fbb Compare November 4, 2024 11:43

JohannesGaessler mentioned this pull request Nov 4, 2024

ggml: new optimization interface #988

Merged

ggerganov marked this pull request as ready for review November 4, 2024 17:37

ggerganov merged commit f3c1e6a into master Nov 4, 2024
4 checks passed

ggerganov deleted the sync branch November 4, 2024 17:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sync : llama.cpp #1006

sync : llama.cpp #1006

Uh oh!

ggerganov commented Nov 4, 2024 •

edited

Loading

Uh oh!

slaren commented Nov 4, 2024 •

edited

Loading

Uh oh!

slaren commented Nov 4, 2024

Uh oh!

Uh oh!

Uh oh!

sync : llama.cpp #1006

sync : llama.cpp #1006

Uh oh!

Conversation

ggerganov commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slaren commented Nov 4, 2024

Uh oh!

Uh oh!

Uh oh!

ggerganov commented Nov 4, 2024 •

edited

Loading

slaren commented Nov 4, 2024 •

edited

Loading