marcuscedricridia/bananafish-0517

Model Description

bananafish-0517 is a proof-of-concept fine-tuned checkpoint built upon the Qwen 0.6B base model. This checkpoint represents an early stage in the fine-tuning process, trained for only 0.25 epochs. The main motivation behind this model is to explore an alternative instruction tuning approach using the ChatML format, departing from the Alpaca-style prompts commonly used with Qwen.

Unlike the official Qwen3 instruction-tuned models, which are heavily aligned toward STEM tasks, bananafish-0517 aims to preserve a more natural, less technical writing style with fewer "GPT-like" artifacts. This makes it a promising base for future creative or general-purpose instruction tuning.

Intended Use

Experimental use to evaluate early-stage fine-tuning on Qwen 0.6B.
Testing alternative prompt formats (ChatML) for conversational generation.
Proof of concept for instruction tuning less focused on STEM-heavy alignment.
Starting point for further fine-tuning iterations to improve versatility and creativity.

Training Details

Base model: Qwen 0.6B
Fine-tuning epochs: 0.25 (only partial epoch)
Training method: LoRA fine-tuning (rank 16, alpha 32)
LoRA dropout: 0.05
RSLora: Enabled (using Unsloth implementation)
Optimizer: AdamW with weight decay 0.0001
Learning rate: 3e-6
LR scheduler: Cosine
Warmup ratio: 0.03

Prompt Format

This checkpoint uses the ChatML-style prompt format:

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

This differs from the Alpaca-style format, aiming to better suit the Qwen architecture and encourage more natural dialogue flow.

Example Usage

from transformers import TextIteratorStreamer
import threading

def create_chatml_prompt(user_message):
    return f"""
<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
"""

user_input = "Who are you?"
prompt = create_chatml_prompt(user_input)

inputs = tokenizer([prompt], return_tensors="pt", padding=True).to("cuda")

streamer = TextIteratorStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

generation_kwargs = dict(
    **inputs,
    max_new_tokens=2048,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    do_sample=True,
    temperature=0.8,
    top_p=0.9,
    streamer=streamer,
)

thread = threading.Thread(target=model.generate, kwargs=generation_kwargs)
thread.start()

for new_text in streamer:
    print(new_text, end="", flush=True)

Reproducibility

A Colab notebook will be provided for reproducibility and testing. Feel free to open a discussion for collaboration or questions.

Why This Model Exists

Many users reported difficulties when fine-tuning Qwen base models, especially with Alpaca-style prompts. This checkpoint tests:
A different, cleaner prompt style (ChatML). (Different from the base Alpaca in the stock unsloth notebook)
Minimal training to observe the impact of prompt format and LoRA fine-tuning.
Moving away from the heavy STEM alignment of official Qwen instruction models toward a freer, more natural writing style.

Limitations

Trained for only a fraction of an epoch so performance and stability are preliminary.
The model is expected to improve significantly with further training.
Currently optimized for inference with LoRA adapters and may require additional tuning for production use.

Acknowledgments

Thanks to the Unsloth team!
Inspired by the Qwen team's open-source base model and instruction tuning efforts.

Stay tuned for further updates and improvements! 😉 (Will do full models tomorrow, its currently 6:19 as I am writing this and I haven't gotten any sleep.)