Model Description

bananafish-0517 is a proof-of-concept fine-tuned checkpoint built upon the Qwen 0.6B base model. This checkpoint represents an early stage in the fine-tuning process, trained for only 0.25 epochs. The main motivation behind this model is to explore an alternative instruction tuning approach using the ChatML format, departing from the Alpaca-style prompts commonly used with Qwen.

Unlike the official Qwen3 instruction-tuned models, which are heavily aligned toward STEM tasks, bananafish-0517 aims to preserve a more natural, less technical writing style with fewer "GPT-like" artifacts. This makes it a promising base for future creative or general-purpose instruction tuning.

Intended Use

  • Experimental use to evaluate early-stage fine-tuning on Qwen 0.6B.
  • Testing alternative prompt formats (ChatML) for conversational generation.
  • Proof of concept for instruction tuning less focused on STEM-heavy alignment.
  • Starting point for further fine-tuning iterations to improve versatility and creativity.

Training Details

  • Base model: Qwen 0.6B
  • Fine-tuning epochs: 0.25 (only partial epoch)
  • Training method: LoRA fine-tuning (rank 16, alpha 32)
  • LoRA dropout: 0.05
  • RSLora: Enabled (using Unsloth implementation)
  • Optimizer: AdamW with weight decay 0.0001
  • Learning rate: 3e-6
  • LR scheduler: Cosine
  • Warmup ratio: 0.03

Prompt Format

This checkpoint uses the ChatML-style prompt format:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

This differs from the Alpaca-style format, aiming to better suit the Qwen architecture and encourage more natural dialogue flow.

Example Usage

from transformers import TextIteratorStreamer
import threading

def create_chatml_prompt(user_message):
    return f"""
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
"""

user_input = "Who are you?"
prompt = create_chatml_prompt(user_input)

inputs = tokenizer([prompt], return_tensors="pt", padding=True).to("cuda")

streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

generation_kwargs = dict(
    **inputs,
    max_new_tokens=2048,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    streamer=streamer,
)

thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

for new_text in streamer:
    print(new_text, end="", flush=True)

Reproducibility

A Colab notebook will be provided for reproducibility and testing. Feel free to open a discussion for collaboration or questions.

Why This Model Exists

  • Many users reported difficulties when fine-tuning Qwen base models, especially with Alpaca-style prompts. This checkpoint tests:
  • A different, cleaner prompt style (ChatML). (Different from the base Alpaca in the stock unsloth notebook)
  • Minimal training to observe the impact of prompt format and LoRA fine-tuning.
  • Moving away from the heavy STEM alignment of official Qwen instruction models toward a freer, more natural writing style.

Limitations

  • Trained for only a fraction of an epoch so performance and stability are preliminary.
  • The model is expected to improve significantly with further training.
  • Currently optimized for inference with LoRA adapters and may require additional tuning for production use.

Acknowledgments

  • Thanks to the Unsloth team!
  • Inspired by the Qwen team's open-source base model and instruction tuning efforts.

Stay tuned for further updates and improvements! ๐Ÿ˜‰ (Will do full models tomorrow, its currently 6:19 as I am writing this and I haven't gotten any sleep.)

Downloads last month
9
Safetensors
Model size
596M params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support