Qwen3-0.6B-Code-Expert (Abliterated)

This project performs full fine-tuning on the Qwen3-0.6B language model to enhance its code reasoning and generation capabilities. Training was conducted exclusively on the nvidia/OpenCodeReasoning dataset, and the model was optimized using the bfloat16 (bf16) data type. Additionally, it has been abliterated to make it steer away from censorship.

Training Procedure

  1. Dataset Preparation

    • nvidia/OpenCodeReasoning dataset was used.
    • Each example consists of code snippets paired with detailed step-by-step reasoning in Chain-of-Thought (CoT) style.
  2. Model Loading and Configuration

    • Qwen3-0.6B base model weights were loaded via the unsloth library in bf16 precision.
    • Full fine-tuning (full_finetuning=True) was applied to all layers for optimal adaptation to code reasoning.
  3. Supervised Fine-Tuning

    • Employed the Hugging Face TRL library with the Supervised Fine-Tuning (SFT) approach.
    • The model was trained to generate correct code solutions along with the corresponding reasoning chains.

Purpose and Outcome

  • The model’s capacity for understanding, reasoning about, and generating code was significantly improved through specialized, single-dataset training in bf16 precision.
  • Outputs include both intermediate reasoning steps and final code solutions, enabling transparent and interpretable code generation.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

Support

Buy Me A Coffee

Downloads last month
8
Safetensors
Model size
596M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lunahr/Qwen3-0.6B-Code-Expert-abliterated

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(1)
this model
Quantizations
2 models

Dataset used to train lunahr/Qwen3-0.6B-Code-Expert-abliterated

Collection including lunahr/Qwen3-0.6B-Code-Expert-abliterated