Skip to content

Conversation

prabhu
Copy link
Collaborator

@prabhu prabhu commented Aug 7, 2025

How safe are the cdx1 models to use? To find out, I used Qwen3-235B to generate some questions and a system instruction to evaluate any model response. As expected, cdx1 refused to answer any of these questions, so it could be considered safe for use.

NOTE: These are not jailbreak and abliteration tests.

Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@prabhu prabhu added the ml label Aug 7, 2025
Signed-off-by: Prabhu Subramanian <prabhu@appthreat.com>
@prabhu prabhu merged commit a56b783 into master Aug 7, 2025
2 of 3 checks passed
@prabhu prabhu deleted the feature/cdx1-safety-assess branch August 7, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant