---
library_name: transformers
tags:
  - text-generation
  - conversational
  - instruction-tuned
  - 4-bit precision
  - bitsandbytes
---

# Rishi-2-2B-IT

**Model ID:** `korarishi/rishi-2-2b-it`

## Model Information
Summary description and brief definition of inputs and outputs.

## Description
The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.

## Running with the pipeline API
```python
import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="korarishi/rishi-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
```

## Running on single / multi GPU
```bash
# pip install accelerate
```
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "korarishi/rishi-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
```

## Chat template usage
```python
messages = [
    {"role": "user", "content": "Write me a poem about Cars."},
]
input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True
).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
```

Developed by: [korarishi](https://huggingface.co/korarishi)