rishi-2-2b-it / README.md
korarishi's picture
Update README.md
c1ea406 verified
metadata
library_name: transformers
tags:
  - text-generation
  - conversational
  - instruction-tuned
  - 4-bit precision
  - bitsandbytes

Rishi-2-2B-IT

Model ID: korarishi/rishi-2-2b-it

Model Information

Summary description and brief definition of inputs and outputs.

Description

The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.

Running with the pipeline API

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="korarishi/rishi-2-2b-it",
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",  # replace with "mps" to run on a Mac device
)

messages = [
    {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
]

outputs = pipe(messages, max_new_tokens=256)
assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
print(assistant_response)

Running on single / multi GPU

# pip install accelerate
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("korarishi/rishi-2-2b-it")
model = AutoModelForCausalLM.from_pretrained(
    "korarishi/rishi-2-2b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
)

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=32)
print(tokenizer.decode(outputs[0]))

Chat template usage

messages = [
    {"role": "user", "content": "Write me a poem about Cars."},
]
input_ids = tokenizer.apply_chat_template(
    messages, return_tensors="pt", return_dict=True
).to("cuda")

outputs = model.generate(**input_ids, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))

Developed by: korarishi