|
--- |
|
library_name: transformers |
|
tags: |
|
- text-generation |
|
- conversational |
|
- instruction-tuned |
|
- 4-bit precision |
|
- bitsandbytes |
|
--- |
|
|
|
# Rishi-2-2B-IT |
|
|
|
**Model ID:** `korarishi/rishi-2-2b-it` |
|
|
|
## Model Information |
|
Summary description and brief definition of inputs and outputs. |
|
|
|
## Description |
|
The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models. |
|
|
|
## Running with the pipeline API |
|
```python |
|
import torch |
|
from transformers import pipeline |
|
|
|
pipe = pipeline( |
|
"text-generation", |
|
model="korarishi/rishi-2-2b-it", |
|
model_kwargs={"torch_dtype": torch.bfloat16}, |
|
device="cuda", # replace with "mps" to run on a Mac device |
|
) |
|
|
|
messages = [ |
|
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."}, |
|
] |
|
|
|
outputs = pipe(messages, max_new_tokens=256) |
|
assistant_response = outputs[0]["generated_text"][-1]["content"].strip() |
|
print(assistant_response) |
|
``` |
|
|
|
## Running on single / multi GPU |
|
```bash |
|
# pip install accelerate |
|
``` |
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"korarishi/rishi-2-2b-it", |
|
device_map="auto", |
|
torch_dtype=torch.bfloat16, |
|
) |
|
|
|
input_text = "Write me a poem about Machine Learning." |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
outputs = model.generate(**input_ids, max_new_tokens=32) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
## Chat template usage |
|
```python |
|
messages = [ |
|
{"role": "user", "content": "Write me a poem about Cars."}, |
|
] |
|
input_ids = tokenizer.apply_chat_template( |
|
messages, return_tensors="pt", return_dict=True |
|
).to("cuda") |
|
|
|
outputs = model.generate(**input_ids, max_new_tokens=256) |
|
print(tokenizer.decode(outputs[0])) |
|
``` |
|
|
|
Developed by: [korarishi](https://huggingface.co/korarishi) |