Skip to content

Conversation

prabhu
Copy link
Collaborator

@prabhu prabhu commented Aug 9, 2025

cdx1-mini (base: unsloth/Qwen3-4B-Instruct-2507) is now available for beta testing.

Logic Category Comparison

Beats o4-mini-high!

Model Accuracy (%)
gemini-2.5-pro 93.60
deepthink-r1 89.63
gpt-5 83.23
deepseek-r1 82.92
gpt-oss-120b 80.49
gpt-oss-20b 79.27
cdx1-pro-mlx-8bit 73.17
cdx1-mini-mlx-8bit 68.29
o4-mini-high 67.99
qwen3-coder-480B 48.48
cdx1-mlx-8bit 46.04

Spec Category Comparison

Matches cdx1-pro even though this is just a 4B model 😎

Model Accuracy (%)
gemini-2.5-pro 100.00
deepseek-r1 98.58
cdx1-pro-mlx-8bit 98.30
cdx1-mini-mlx-8bit 97.16
gpt-5 95.17
qwen3-coder-480B 90.34
gpt-oss-120b 89.20
cdx1-mlx-8bit 83.52
deepthink-r1 12.36
gpt-oss-20b 9.09
o4-mini-high 0.00

Other categories

Due to the small size, it has limited knowledge in these topics.

category cdx1-mlx-8bit cdx1-pro-mlx-8bit cdx1-mini-mlx-8bit
devops 87.46% 96.1% 43.73%
docker 89.08% TBD 84.87%
linux 90.6% 95.8% 87.43%

Known bugs

  • Erroneous tool_call block appears with responses. If this goes unfixed, we need to workaround this in code.

@prabhu prabhu added the ml label Aug 9, 2025
@@ -13,7 +16,7 @@ case "$TOOL_BASE_MODEL" in
esac
BASE_MODEL_MLX=${BASE_MODEL}-${TUNING_TOOL}
HF_ORG=CycloneDX
NUM_LAYERS=16
NUM_LAYERS=32
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on cdx1-mini model, I noticed the bug with our lora fine-tuning. The performance for cdx1 and cdx1-pro can be improved slightly in a future iteration.

mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type lora --batch-size 1 --num-layers ${NUM_LAYERS} --iters 2000 --grad-checkpoint --max-seq-length 16000 --learning-rate "3e-5" --optimizer adam
if [ "$TOOL_BASE_MODEL" = "cdx1-mini" ]; then
echo "Full fine-tune with cdx-docs dataset. This might take a while ..."
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type full --batch-size 2 --iters 1500 --grad-checkpoint --max-seq-length 8192 --learning-rate "1e-5" --optimizer adamw --num-layers 32 --seed 42
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mini, we take a small 4B model and perform a full fine-tuning rather than lora/qlora.

@prabhu prabhu force-pushed the feature/cdx1-mini branch 2 times, most recently from b9de9e9 to 2c4f58c Compare August 9, 2025 17:40
@prabhu prabhu marked this pull request as ready for review August 9, 2025 21:46
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@prabhu prabhu force-pushed the feature/cdx1-mini branch from 94ad088 to 4a34e36 Compare August 9, 2025 21:51
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type full --batch-size 2 --num-layers ${NUM_LAYERS} --iters ${ITERS} --grad-checkpoint --max-seq-length ${MAX_SEQ} --learning-rate "1e-5" --optimizer adamw
elif [ "$TOOL_BASE_MODEL" = "cdx1" ]; then
echo "Low-Rank Adaptation (LoRA) fine-tuning ${BASE_MODEL} with cdx-docs dataset. This might take a while ..."
mlx_lm.lora --model ${BASE_MODEL} --train --data ${DATASET_PATH} --adapter-path ${ADAPTERS_PATH} --mask-prompt --fine-tune-type lora --batch-size 1 --num-layers ${NUM_LAYERS} --iters ${ITERS} --grad-checkpoint --max-seq-length ${MAX_SEQ} --learning-rate "1e-4" --optimizer adamw
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have started a new fine-tuning job for cdx1. Will create a separate PR tomorrow once it is ready for reevaluation.

@prabhu prabhu merged commit 6e58ed3 into master Aug 9, 2025
3 checks passed
@prabhu prabhu deleted the feature/cdx1-mini branch August 9, 2025 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant