-
-
Notifications
You must be signed in to change notification settings - Fork 211
cdx1-mini #2148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdx1-mini #2148
Conversation
contrib/fine-tuning/fine-tune-mlx.sh
Outdated
@@ -13,7 +16,7 @@ case "$TOOL_BASE_MODEL" in | |||
esac | |||
BASE_MODEL_MLX=${BASE_MODEL}-${TUNING_TOOL} | |||
HF_ORG=CycloneDX | |||
NUM_LAYERS=16 | |||
NUM_LAYERS=32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on cdx1-mini model, I noticed the bug with our lora fine-tuning. The performance for cdx1 and cdx1-pro can be improved slightly in a future iteration.
contrib/fine-tuning/fine-tune-mlx.sh
Outdated
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type lora --batch-size 1 --num-layers ${NUM_LAYERS} --iters 2000 --grad-checkpoint --max-seq-length 16000 --learning-rate "3e-5" --optimizer adam | ||
if [ "$TOOL_BASE_MODEL" = "cdx1-mini" ]; then | ||
echo "Full fine-tune with cdx-docs dataset. This might take a while ..." | ||
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type full --batch-size 2 --iters 1500 --grad-checkpoint --max-seq-length 8192 --learning-rate "1e-5" --optimizer adamw --num-layers 32 --seed 42 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For mini, we take a small 4B model and perform a full fine-tuning rather than lora/qlora.
b9de9e9
to
2c4f58c
Compare
94ad088
to
4a34e36
Compare
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type full --batch-size 2 --num-layers ${NUM_LAYERS} --iters ${ITERS} --grad-checkpoint --max-seq-length ${MAX_SEQ} --learning-rate "1e-5" --optimizer adamw | ||
elif [ "$TOOL_BASE_MODEL" = "cdx1" ]; then | ||
echo "Low-Rank Adaptation (LoRA) fine-tuning ${BASE_MODEL} with cdx-docs dataset. This might take a while ..." | ||
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type lora --batch-size 1 --num-layers ${NUM_LAYERS} --iters ${ITERS} --grad-checkpoint --max-seq-length ${MAX_SEQ} --learning-rate "1e-4" --optimizer adamw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have started a new fine-tuning job for cdx1
. Will create a separate PR tomorrow once it is ready for reevaluation.
cdx1-mini (base: unsloth/Qwen3-4B-Instruct-2507) is now available for beta testing.
Logic Category Comparison
Beats o4-mini-high!
Spec Category Comparison
Matches
cdx1-pro
even though this is just a 4B model 😎Other categories
Due to the small size, it has limited knowledge in these topics.
Known bugs