File size: 2,239 Bytes
f0c790d d6f3c8b e982d53 f0c790d e982d53 f0c790d ef89cdd f0c790d e982d53 f0c790d e982d53 f0c790d e982d53 d6f3c8b e982d53 f0c790d e982d53 ef89cdd e982d53 d6f3c8b f0c790d e982d53 d6f3c8b e982d53 d6f3c8b e982d53 d6f3c8b e982d53 d6f3c8b c1ea406 d6f3c8b c1ea406 e982d53 d6f3c8b e982d53 f0c790d e982d53 f0c790d e982d53 ef89cdd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
library_name: transformers
tags:
- text-generation
- conversational
- instruction-tuned
- 4-bit precision
- bitsandbytes
---
# Rishi-2-2B-IT
**Model ID:** `korarishi/rishi-2-2b-it`
## Model Information
Summary description and brief definition of inputs and outputs.
## Description
The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.
## Running with the pipeline API
```python
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="korarishi/rishi-2-2b-it",
model_kwargs={"torch_dtype": torch.bfloat16},
device="cuda", # replace with "mps" to run on a Mac device
)
messages = [
{"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]
outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)
```
## Running on single / multi GPU
```bash
# pip install accelerate
```
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
"korarishi/rishi-2-2b-it",
device_map="auto",
torch_dtype=torch.bfloat16,
)
input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))
```
## Chat template usage
```python
messages = [
{"role": "user", "content": "Write me a poem about Cars."},
]
input_ids = tokenizer.apply_chat_template(
messages, return_tensors="pt", return_dict=True
).to("cuda")
outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
```
Developed by: [korarishi](https://huggingface.co/korarishi) |