llamacpp_convert_to_gguf does not support 4-bit quantied models (KeyError: 'U8')

I have GPU accelerated training with CUDA / ROCm and BitsAndBytes 4-bit quantization working. See https://github.com/instruct-lab/cli/pull/520#issuecomment-1993645744 for more information. However `lab train` fails to convert the trained model, because the output has different tensors and `llamacpp_convert_to_gguf` does not support that kind of model.

First of all, it fails because `SAFETENSORS_DATA_TYPES` does not have dtype `U8`. When I ignore tensors with dtype `U8`, then the code fails in `pick_output_type()`. The function expects `gguf.TENSOR_NAMES[gguf.MODEL_TENSOR.ATTN_Q].format(bid=0) + ".weight"` (`blk.0.attn_q.weight`), but the model has `ATTN_NORM` (`blk.0.attn_norm.weight`). The converter also complains about a lot of unknown tensors.

I'm not familiar with gguf format and don't know how to address the problem myself. To reproduce the problem, use my PR #520 and run training with `lab train --device cuda --4-bit-quant`. You need a supported GPU with at least 11 GB memory, a PyTorch build with CUDA or ROCm bindings, and BitsAndBytes for CUDA or ROCm.

If it's of any help, here is a list of tensors from the input model with their dtype and shape:

```
model.layers.0.input_layernorm.weight, dtype: F32, shape: [4096]
model.layers.0.mlp.down_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.down_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.down_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.gate_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.gate_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.gate_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.up_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.up_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.up_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.post_attention_layernorm.weight, dtype: F32, shape: [4096]
model.layers.0.self_attn.k_proj.weight.nested_absmax, dtype: F32, shape: [256]
model.layers.0.self_attn.k_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.k_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.o_proj.weight.nested_absmax, dtype: F32, shape: [1024]
model.layers.0.self_attn.o_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.o_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.q_proj.weight.nested_absmax, dtype: F32, shape: [1024]
model.layers.0.self_attn.q_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.q_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.v_proj.weight.nested_absmax, dtype: F32, shape: [256]
model.layers.0.self_attn.v_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.v_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.down_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.down_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.down_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [174]
model.layers.0.mlp.gate_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.gate_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.gate_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [174]
model.layers.0.mlp.up_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.up_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.up_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [173]
model.layers.0.self_attn.k_proj.weight, dtype: U8, shape: [2097152, 1]
model.layers.0.self_attn.k_proj.weight.absmax, dtype: U8, shape: [65536]
model.layers.0.self_attn.k_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.o_proj.weight, dtype: U8, shape: [8388608, 1]
model.layers.0.self_attn.o_proj.weight.absmax, dtype: U8, shape: [262144]
model.layers.0.self_attn.o_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.q_proj.weight, dtype: U8, shape: [8388608, 1]
model.layers.0.self_attn.q_proj.weight.absmax, dtype: U8, shape: [262144]
model.layers.0.self_attn.q_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.v_proj.weight, dtype: U8, shape: [2097152, 1]
model.layers.0.self_attn.v_proj.weight.absmax, dtype: U8, shape: [65536]
model.layers.0.self_attn.v_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

llamacpp_convert_to_gguf does not support 4-bit quantied models (KeyError: 'U8') #579

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

llamacpp_convert_to_gguf does not support 4-bit quantied models (KeyError: 'U8') #579

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions