File size: 2,239 Bytes

f0c790d
 
d6f3c8b
e982d53
 
 
 
 
f0c790d
 
e982d53
f0c790d
ef89cdd
f0c790d
e982d53
 
f0c790d
e982d53
 
f0c790d
e982d53
d6f3c8b
e982d53
 
f0c790d
e982d53
 
ef89cdd
e982d53
 
d6f3c8b
f0c790d
e982d53
 
 
d6f3c8b
e982d53
 
 
 
d6f3c8b
e982d53
 
 
 
d6f3c8b
 
e982d53
d6f3c8b
c1ea406
d6f3c8b
c1ea406
e982d53
 
d6f3c8b
 
e982d53
 
f0c790d
e982d53
 
 
f0c790d
e982d53
 
 
 
 
 
 
 
 
 
 
 
 
ef89cdd

---
library_name: transformers
tags:
  - text-generation
  - conversational
  - instruction-tuned
  - 4-bit precision
  - bitsandbytes
---

# Rishi-2-2B-IT

**Model ID:** `korarishi/rishi-2-2b-it`

## Model Information
Summary description and brief definition of inputs and outputs.

## Description
The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.

## Running with the pipeline API
```python
import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="korarishi/rishi-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
```

## Running on single / multi GPU
```bash
# pip install accelerate
```
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "korarishi/rishi-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
```

## Chat template usage
```python
messages = [
    {"role": "user", "content": "Write me a poem about Cars."},
]
input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True
).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
```

Developed by: [korarishi](https://huggingface.co/korarishi)