Rishi-2-2B-IT

Model ID: korarishi/rishi-2-2b-it

Model Information

Summary description and brief definition of inputs and outputs.

Description

The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.

Running with the pipeline API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="korarishi/rishi-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)

Running on single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "korarishi/rishi-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))

Chat template usage

messages = [
    {"role": "user", "content": "Write me a poem about Cars."},
]
input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True
).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Developed by: korarishi

Downloads last month
36
Safetensors
Model size
1.63B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support