Model Description
bananafish-0517 is a proof-of-concept fine-tuned checkpoint built upon the Qwen 0.6B base model. This checkpoint represents an early stage in the fine-tuning process, trained for only 0.25 epochs. The main motivation behind this model is to explore an alternative instruction tuning approach using the ChatML format, departing from the Alpaca-style prompts commonly used with Qwen.
Unlike the official Qwen3 instruction-tuned models, which are heavily aligned toward STEM tasks, bananafish-0517 aims to preserve a more natural, less technical writing style with fewer "GPT-like" artifacts. This makes it a promising base for future creative or general-purpose instruction tuning.
Intended Use
- Experimental use to evaluate early-stage fine-tuning on Qwen 0.6B.
- Testing alternative prompt formats (ChatML) for conversational generation.
- Proof of concept for instruction tuning less focused on STEM-heavy alignment.
- Starting point for further fine-tuning iterations to improve versatility and creativity.
Training Details
- Base model: Qwen 0.6B
- Fine-tuning epochs: 0.25 (only partial epoch)
- Training method: LoRA fine-tuning (rank 16, alpha 32)
- LoRA dropout: 0.05
- RSLora: Enabled (using Unsloth implementation)
- Optimizer: AdamW with weight decay 0.0001
- Learning rate: 3e-6
- LR scheduler: Cosine
- Warmup ratio: 0.03
Prompt Format
This checkpoint uses the ChatML-style prompt format:
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
This differs from the Alpaca-style format, aiming to better suit the Qwen architecture and encourage more natural dialogue flow.
Example Usage
from transformers import TextIteratorStreamer
import threading
def create_chatml_prompt(user_message):
return f"""
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
"""
user_input = "Who are you?"
prompt = create_chatml_prompt(user_input)
inputs = tokenizer([prompt], return_tensors="pt", padding=True).to("cuda")
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
generation_kwargs = dict(
**inputs,
max_new_tokens=2048,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.eos_token_id,
do_sample=True,
temperature=0.8,
top_p=0.9,
streamer=streamer,
)
thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()
for new_text in streamer:
print(new_text, end="", flush=True)
Reproducibility
A Colab notebook will be provided for reproducibility and testing. Feel free to open a discussion for collaboration or questions.
Why This Model Exists
- Many users reported difficulties when fine-tuning Qwen base models, especially with Alpaca-style prompts. This checkpoint tests:
- A different, cleaner prompt style (ChatML). (Different from the base Alpaca in the stock unsloth notebook)
- Minimal training to observe the impact of prompt format and LoRA fine-tuning.
- Moving away from the heavy STEM alignment of official Qwen instruction models toward a freer, more natural writing style.
Limitations
- Trained for only a fraction of an epoch so performance and stability are preliminary.
- The model is expected to improve significantly with further training.
- Currently optimized for inference with LoRA adapters and may require additional tuning for production use.
Acknowledgments
- Thanks to the Unsloth team!
- Inspired by the Qwen team's open-source base model and instruction tuning efforts.
Stay tuned for further updates and improvements! ๐ (Will do full models tomorrow, its currently 6:19 as I am writing this and I haven't gotten any sleep.)
- Downloads last month
- 9