Usage
This model uses the 4-bits quantization. So you need to install bitsandbytes to use it.
pip install bitsandbytes
For inference (streaming):
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
from transformers import TextIteratorStreamer
from threading import Thread
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = """
### System
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
Try an answer and see if it's correct before generate the ouput.
But don't forget to think very carefully.
### Question
The question here.
"""
chat = [
{ "role": "user", "content": prompt},
]
question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
question = tokenizer(question, return_tensors="pt").to(device)
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)
generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
# generate
thread.start()
for new_text in streamer:
print(new_text, end="")
Some info
If you want to know how I fine tune it, what datasets I used and the training code. See here
Model Trained Using AutoTrain
This model was trained using AutoTrain. For more information, please visit AutoTrain.
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Arthur-LAGACHERIE/Reflection-Gemma-2-2b
Base model
Arthur-LAGACHERIE/Gemma-2-2b-4bit