-
Notifications
You must be signed in to change notification settings - Fork 441
Description
I have GPU accelerated training with CUDA / ROCm and BitsAndBytes 4-bit quantization working. See #520 (comment) for more information. However lab train
fails to convert the trained model, because the output has different tensors and llamacpp_convert_to_gguf
does not support that kind of model.
First of all, it fails because SAFETENSORS_DATA_TYPES
does not have dtype U8
. When I ignore tensors with dtype U8
, then the code fails in pick_output_type()
. The function expects gguf.TENSOR_NAMES[gguf.MODEL_TENSOR.ATTN_Q].format(bid=0) + ".weight"
(blk.0.attn_q.weight
), but the model has ATTN_NORM
(blk.0.attn_norm.weight
). The converter also complains about a lot of unknown tensors.
I'm not familiar with gguf format and don't know how to address the problem myself. To reproduce the problem, use my PR #520 and run training with lab train --device cuda --4-bit-quant
. You need a supported GPU with at least 11 GB memory, a PyTorch build with CUDA or ROCm bindings, and BitsAndBytes for CUDA or ROCm.
If it's of any help, here is a list of tensors from the input model with their dtype and shape:
model.layers.0.input_layernorm.weight, dtype: F32, shape: [4096]
model.layers.0.mlp.down_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.down_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.down_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.gate_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.gate_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.gate_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.up_proj.weight.nested_absmax, dtype: F32, shape: [3584]
model.layers.0.mlp.up_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.mlp.up_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.post_attention_layernorm.weight, dtype: F32, shape: [4096]
model.layers.0.self_attn.k_proj.weight.nested_absmax, dtype: F32, shape: [256]
model.layers.0.self_attn.k_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.k_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.o_proj.weight.nested_absmax, dtype: F32, shape: [1024]
model.layers.0.self_attn.o_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.o_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.q_proj.weight.nested_absmax, dtype: F32, shape: [1024]
model.layers.0.self_attn.q_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.q_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.self_attn.v_proj.weight.nested_absmax, dtype: F32, shape: [256]
model.layers.0.self_attn.v_proj.weight.nested_quant_map, dtype: F32, shape: [256]
model.layers.0.self_attn.v_proj.weight.quant_map, dtype: F32, shape: [16]
model.layers.0.mlp.down_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.down_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.down_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [174]
model.layers.0.mlp.gate_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.gate_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.gate_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [174]
model.layers.0.mlp.up_proj.weight, dtype: U8, shape: [29360128, 1]
model.layers.0.mlp.up_proj.weight.absmax, dtype: U8, shape: [917504]
model.layers.0.mlp.up_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [173]
model.layers.0.self_attn.k_proj.weight, dtype: U8, shape: [2097152, 1]
model.layers.0.self_attn.k_proj.weight.absmax, dtype: U8, shape: [65536]
model.layers.0.self_attn.k_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.o_proj.weight, dtype: U8, shape: [8388608, 1]
model.layers.0.self_attn.o_proj.weight.absmax, dtype: U8, shape: [262144]
model.layers.0.self_attn.o_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.q_proj.weight, dtype: U8, shape: [8388608, 1]
model.layers.0.self_attn.q_proj.weight.absmax, dtype: U8, shape: [262144]
model.layers.0.self_attn.q_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]
model.layers.0.self_attn.v_proj.weight, dtype: U8, shape: [2097152, 1]
model.layers.0.self_attn.v_proj.weight.absmax, dtype: U8, shape: [65536]
model.layers.0.self_attn.v_proj.weight.quant_state.bitsandbytes__nf4, dtype: U8, shape: [172]