CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

shibe2 · 2023-10-18T19:36:16Z

When broadcasting, each 2D plane of src0 is matched with multiple 2D planes of src1. Planes of src0 need to be copied and/or de-quantized only once per multiple GEMM operations. A more natural way to handle this is to create outer loop over src0 and do copying and de-quantisation there.

Previously, de-quantization was performed before each GEMM. Now it is moved to an outer loop.

There is still duplication in case of broadcasting over dimension 3. Handling that properly would require a more substantial change, i.e. storing 3D instead of 2D slices of src0 in VRAM. I don't know of a case where broadcasting over dimension 3 is currently used, so I leave it for the future. Nevertheless, it produces correct results even in this case.

In case of matrix-vector multiplication, de-quantization is done repeatedly because de-quantized data is not stored in RAM.

Tested in isolation and with models that use GQA.

Reduce repeated dequantization of the same data.

CLBlast: Add outer loops over src0 for broadcasting in mulmat

4a95d91

Reduce repeated dequantization of the same data.

ggerganov approved these changes Oct 20, 2023

View reviewed changes

shibe2 merged commit 465219b into ggml-org:master Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

Uh oh!

shibe2 commented Oct 18, 2023

Uh oh!

Uh oh!

CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

CLBlast: Add outer loops over src0 for broadcasting in mulmat #3669

Uh oh!

Conversation

shibe2 commented Oct 18, 2023

Uh oh!

Uh oh!