-
-
Notifications
You must be signed in to change notification settings - Fork 210
cdx1 retune #2149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cdx1 retune #2149
Conversation
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
ITERS=2500 | ||
BASE_MODEL="unsloth/Qwen3-Coder-30B-A3B-Instruct" | ||
;; | ||
*) | ||
TEMP=0.55 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value was arrived at by painstakingly testing the various quantized models with a range of questions and identifying the parameters that offered decent results. Mileage will vary.
#!/usr/bin/env bash
set -e
echo $1
# Function to get appropriate top_p value based on temperature
get_top_p() {
local temp=$1
if (( $(echo "$temp < 0.3" | bc -l) )); then
echo "0.7"
else
echo "0.9"
fi
}
# Function to get appropriate top_k value based on temperature
get_top_k() {
local temp=$1
if (( $(echo "$temp <= 0.8" | bc -l) )); then
echo "20"
else
echo "40"
fi
}
# Loop from 0.05 to 1.0 in increments of 0.05
for temp in $(seq 0 0.05 1.0); do
# Get appropriate top_p and top_k for this temperature
top_p=$(get_top_p $temp)
top_k=$(get_top_k $temp)
echo "Generating with temperature: $temp, top_p: $top_p, top_k: $top_k"
mlx_lm.generate --model ./CycloneDX/cdx1-mlx-8bit --prompt "$1" --temp $temp --top-p $top_p --max-tokens 16000
echo "----------------------------------------"
done
ITERS=2000 | ||
MAX_SEQ=32768 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had previously fine-tuned 16 layers with 16K tokens, which confused it to the core.
@@ -1,6 +1,6 @@ | |||
FROM ./cdx1-mini-4B-q8_0.gguf | |||
|
|||
PARAMETER num_ctx 16000 | |||
PARAMETER num_ctx 262144 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mini can now support 256K context!
@@ -20,12 +20,12 @@ hf upload --quiet --repo-type dataset CycloneDX/cdx-docs ./guides guides | |||
hf upload --quiet --repo-type dataset CycloneDX/cdx-docs ./semantics semantics | |||
|
|||
echo "Uploading models. Please wait ..." | |||
hf upload --quiet --repo-type model ${QUANT_MODEL_8BIT} ./${QUANT_MODEL_8BIT} . | |||
hf upload --quiet --repo-type model ${QUANT_MODEL_6BIT} ./${QUANT_MODEL_6BIT} . | |||
hf upload --quiet --exclude "**/README.md" --repo-type model ${QUANT_MODEL_8BIT} ./${QUANT_MODEL_8BIT} . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our fine-tuning operation was creating a default README page, which was overwriting the full README with the various technical reports on Hugging Face. I have to edit the README page on HF every time there is a new upload 😒. Perhaps, I need to find a place to store the HF README pages.
cdx1, being our first fine-tune, became the victim of poorly configured fine-tune settings which negatively affected its benchmark results. This is entirely my fault.
For the second attempt, I used Qwen3-coder-480B and gave it the model card of the base model, unsloth template fixes, and relevant code snippets from mlx-lm lora.py and fuse.py to arrive at a suggestion for a lora fine-tuning configuration. It happily suggested a full fine-tune of the base 14B model. Unfortunately, our Mac Minis have only 64GB RAM, making it impossible to perform a full fine-tune. However, I was able to tune all the 48 layers up from the original 16.
The last question was identifying the parameters for inference. Unlike the Qwen3 models, Qwen2.5 surprisingly had no recommendation for these parameters anywhere. Their benchmarks cleverly use a temperature of 0.
I wrote a script to repeatedly generate text for various temperature, top-p, top-k values and arrived at a value of
0.55
that consistently gave a good-quality answer mostly. However, users are reminded to run their own tests in their environment to identify these values, since nothing in the LLM world is deterministic! (Perhaps, we should just stick to CLI tools and not bother with all this AI)Re-running the logic tests
During our last test, cdx1 scored only 46.04 in the logic tests. The new improved model scores 70.12, although there were a number of caveats. Due to the limited context size of Qwen2.5 (32768), it could take on only 10 to 20 questions at a time, thus requiring a lot of babysitting. Unfortunately, there is no 14B model for qwen3-coder yet, so we need to live with this previous generation model for a bit.