metadata

base_model:
  - unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit
tags:
  - transformers
  - unsloth
  - qwen2
  - trl
  - text-generation-inference
  - reasoning
  - emoji
license: apache-2.0
language:
  - en
library_name: peft
pipeline_tag: text-generation
datasets:
  - openai/gsm8k

Reasoning with Emoji

Why?

Good question. I could carry on about advancing the frontiers of ML, but let's face it I did it for the lulz. I was just curious what would happen. Now I know. OK, I had some questions:

Can an LLM reason with emoji?
Would it be hilarious?

Is it good?

No. I believe my rewards were penalising reasoning length too heavily. It's also possible that reasoning with emojis is just a dumb idea. More research is needed.

Is it interesting?

Sure! It may lend evidence, although doesn't prove, the idea that model reasoning and CoT is actually doing what it appears to be doing, and the words it chooses are semantically relevant.

Future Directions

Further finetuning with rewards that encourage a longer reasoning phase
Different datasets - emoji might be unsuitable for mathematical reasoning

Usage

Use with `transformers`

from transformers import pipeline

pipe = pipeline("text-generation", "nomadicsynth/Qwen2.5-3B-Instruct-emoji-reasoning-gsm8k-lora")

SYSTEM_PROMPT = """
Respond in the following format:
<💭>
[emojis]
</💭>
<🎯>
[...]
</🎯>
"""

messages = [
  {"role": "system", "content": SYSTEM_PROMPT}
  {"role": "user", "content": "How may r's in Strawberry"}
]

response = pipe(messages)
print(response[0]["generated_text"][-1]["content"])

Development

Developed by: nomadicsynth
License: apache-2.0
Fine-tuning Notebook: Qwen2_5_(3B)_GRPO_emoji_hf.ipynb
Finetuned from model : unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit

This qwen2 model was trained 2x faster with Unsloth and Huggingface's TRL library.