Currently, we implement `ggml_conv_1d` and `ggml_conv_2d` as a sequence of 2 internal ops: `stage_0` and `stage_1` For more context see the discussion: https://github.com/ggerganov/ggml/pull/483 We should instead introduce `ggml_im2col` and reuse the `ggml_mul_mat` implementation both on the CPU and the GPU.