Skip to content

Conversation

prabhu
Copy link
Collaborator

@prabhu prabhu commented Aug 10, 2025

cdx1, being our first fine-tune, became the victim of poorly configured fine-tune settings which negatively affected its benchmark results. This is entirely my fault.

For the second attempt, I used Qwen3-coder-480B and gave it the model card of the base model, unsloth template fixes, and relevant code snippets from mlx-lm lora.py and fuse.py to arrive at a suggestion for a lora fine-tuning configuration. It happily suggested a full fine-tune of the base 14B model. Unfortunately, our Mac Minis have only 64GB RAM, making it impossible to perform a full fine-tune. However, I was able to tune all the 48 layers up from the original 16.

The last question was identifying the parameters for inference. Unlike the Qwen3 models, Qwen2.5 surprisingly had no recommendation for these parameters anywhere. Their benchmarks cleverly use a temperature of 0.

I wrote a script to repeatedly generate text for various temperature, top-p, top-k values and arrived at a value of 0.55 that consistently gave a good-quality answer mostly. However, users are reminded to run their own tests in their environment to identify these values, since nothing in the LLM world is deterministic! (Perhaps, we should just stick to CLI tools and not bother with all this AI)

Re-running the logic tests

During our last test, cdx1 scored only 46.04 in the logic tests. The new improved model scores 70.12, although there were a number of caveats. Due to the limited context size of Qwen2.5 (32768), it could take on only 10 to 20 questions at a time, thus requiring a lot of babysitting. Unfortunately, there is no 14B model for qwen3-coder yet, so we need to live with this previous generation model for a bit.

Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
ITERS=2500
BASE_MODEL="unsloth/Qwen3-Coder-30B-A3B-Instruct"
;;
*)
TEMP=0.55
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value was arrived at by painstakingly testing the various quantized models with a range of questions and identifying the parameters that offered decent results. Mileage will vary.

#!/usr/bin/env bash
set -e
echo $1

# Function to get appropriate top_p value based on temperature
get_top_p() {
    local temp=$1
    if (( $(echo "$temp < 0.3" | bc -l) )); then
        echo "0.7"
    else
        echo "0.9"
    fi
}

# Function to get appropriate top_k value based on temperature
get_top_k() {
    local temp=$1
    if (( $(echo "$temp <= 0.8" | bc -l) )); then
        echo "20"
    else
        echo "40"
    fi
}

# Loop from 0.05 to 1.0 in increments of 0.05
for temp in $(seq 0 0.05 1.0); do
    # Get appropriate top_p and top_k for this temperature
    top_p=$(get_top_p $temp)
    top_k=$(get_top_k $temp)

    echo "Generating with temperature: $temp, top_p: $top_p, top_k: $top_k"
    mlx_lm.generate --model ./CycloneDX/cdx1-mlx-8bit --prompt "$1" --temp $temp --top-p $top_p --max-tokens 16000
    echo "----------------------------------------"
done

ITERS=2000
MAX_SEQ=32768
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had previously fine-tuned 16 layers with 16K tokens, which confused it to the core.

@@ -1,6 +1,6 @@
FROM ./cdx1-mini-4B-q8_0.gguf

PARAMETER num_ctx 16000
PARAMETER num_ctx 262144
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mini can now support 256K context!

@@ -20,12 +20,12 @@ hf upload --quiet --repo-type dataset CycloneDX/cdx-docs ./guides guides
hf upload --quiet --repo-type dataset CycloneDX/cdx-docs ./semantics semantics

echo "Uploading models. Please wait ..."
hf upload --quiet --repo-type model ${QUANT_MODEL_8BIT} ./${QUANT_MODEL_8BIT} .
hf upload --quiet --repo-type model ${QUANT_MODEL_6BIT} ./${QUANT_MODEL_6BIT} .
hf upload --quiet --exclude "**/README.md" --repo-type model ${QUANT_MODEL_8BIT} ./${QUANT_MODEL_8BIT} .
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our fine-tuning operation was creating a default README page, which was overwriting the full README with the various technical reports on Hugging Face. I have to edit the README page on HF every time there is a new upload 😒. Perhaps, I need to find a place to store the HF README pages.

@prabhu prabhu merged commit a189254 into master Aug 10, 2025
3 checks passed
@prabhu prabhu deleted the feature/cdx1-retune branch August 10, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant