Usage
This model uses the 4-bits quantization. So you need to install bitsandbytes to use it.
pip install bitsandbytes
For inference (streaming):
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
import torch
from transformers import TextIteratorStreamer
from threading import Thread
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_id = "Arthur-LAGACHERIE/Reflection-Gemma-2-2b"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = """
### System
You are a world-class AI system, capable of complex reasoning and reflection.
Reason through the query inside <thinking> tags, and then provide your final response inside <output> tags.
If you detect that you made a mistake in your reasoning at any point, correct yourself inside <reflection> tags.
Try an answer and see if it's correct before generate the ouput.
But don't forget to think very carefully.
### Question
The question here.
"""
chat = [
{ "role": "user", "content": prompt},
]
question = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
question = tokenizer(question, return_tensors="pt").to(device)
streamer = TextIteratorStreamer(tokenizer, skip_prompt=True)
generation_kwargs = dict(question, streamer=streamer, max_new_tokens=4000)
thread = Thread(target=model.generate, kwargs=generation_kwargs)
# generate
thread.start()
for new_text in streamer:
print(new_text, end="")
Some info
If you want to know how I fine tune it, what datasets I used and the training code. See here
Model Trained Using AutoTrain
This model was trained using AutoTrain. For more information, please visit AutoTrain.
- Downloads last month
- 77
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for Arthur-LAGACHERIE/Reflection-Gemma-2-2b
Base model
Arthur-LAGACHERIE/Gemma-2-2b-4bit