Rishi Kora commited on
Commit
e982d53
·
verified ·
1 Parent(s): 084e3a9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -141
README.md CHANGED
@@ -1,167 +1,78 @@
1
  ---
2
  library_name: transformers
3
  tags:
4
- - text-generation
5
- - conversational
6
- - instruction-tuned
7
- - 4-bit precision
8
- - bitsandbytes
9
- license: apache-2.0
10
- language:
11
- - en
12
- base_model:
13
- - google/gemma-2-2b-it
14
  ---
15
 
16
- # rishi-2-2B-IT
17
 
18
  **Model ID:** `korarishi1027/rishi-2-2b-it`
19
 
20
- rishi-2-2B-IT is a 4-bit quantized, instruction-tuned variant of Google’s Gemma-2 2B decoder-only language model, optimized for efficient chat and general text generation in English.
 
21
 
22
- ## Model Details
23
-
24
- ### Model Description
25
-
26
- Gemma is a family of lightweight, state-of-the-art open models from Google, built on the same technology as the Gemini series. Kora-2-2B-IT has **2.61 B parameters**, quantized to **4-bit NF4** (with double quantization) and uses **bfloat16** for on-the-fly compute to reduce its GPU footprint.
27
-
28
- - **Developed by:** Google Research
29
- - **Shared by:** korarishi1027
30
- - **Finetuned from:** `google/gemma-2-2b-it`
31
- - **Model type:** Causal language model (decoder-only)
32
- - **Language(s):** English
33
- - **License:** Apache-2.0
34
-
35
- ### Quantization & Memory
36
 
 
37
  ```python
38
- from bitsandbytes import BitsAndBytesConfig
 
39
 
40
- quant_config = BitsAndBytesConfig(
41
- load_in_4bit=True,
42
- bnb_4bit_use_double_quant=True,
43
- bnb_4bit_compute_dtype=torch.bfloat16,
44
- bnb_4bit_quant_type="nf4"
45
  )
46
 
47
- ### Intended Uses
48
-
49
- ## Direct Use
50
- - Chatbots and conversational agents
51
- - Story, email, or code snippet generation
52
- - Summarization, Q&A, and instruction following
53
-
54
- ### Downstream Use
55
- - Fine-tuning for domain-specific tasks (e.g. legal, medical, technical summarization)
56
- - Integration into larger NLP pipelines or applications
57
-
58
- ## Out-of-Scope / Misuse
59
- - High-stakes domains (medical, legal) without human review
60
- - Real-time decision systems
61
- - Any use requiring perfect factual accuracy
62
-
63
- ---
64
-
65
- ## Bias, Risks & Limitations
66
- - Inherits biases from its pre-training and instruction-tuning data
67
- - Quantization may introduce minor artifacts or rare decoding glitches
68
- - Not guaranteed to be up-to-date on world events or specialized knowledge
69
-
70
- ## Recommendations
71
- - Always validate critical outputs with human oversight
72
- - Use guardrails or filters if exposing the model to untrusted inputs
73
 
74
- ## How to Get Started
 
 
 
75
 
 
 
 
 
76
  ```python
77
- import torch
78
  from transformers import AutoTokenizer, AutoModelForCausalLM
79
- from bitsandbytes import BitsAndBytesConfig
80
-
81
- quant_config = BitsAndBytesConfig(
82
- load_in_4bit=True,
83
- bnb_4bit_use_double_quant=True,
84
- bnb_4bit_compute_dtype=torch.bfloat16,
85
- bnb_4bit_quant_type="nf4"
86
- )
87
 
88
  tokenizer = AutoTokenizer.from_pretrained("korarishi1027/rishi-2-2b-it")
89
  model = AutoModelForCausalLM.from_pretrained(
90
  "korarishi1027/rishi-2-2b-it",
91
- quantization_config=quant_config,
92
- device_map="auto"
93
  )
94
 
95
- prompt = "Translate to Shakespearean English: Hello, friend!"
96
- inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
97
- output = model.generate(**inputs, max_new_tokens=60)
98
- print(tokenizer.decode(output[0], skip_special_tokens=True))
99
-
100
-
101
- ## Training Details
102
-
103
- ### Training Data
104
- - **Pre-training:** Large-scale English web text corpora used by Google Gemma
105
- - **Instruction tuning:** Public instruction-following datasets (e.g., OpenAI’s InstructGPT mixtures)
106
-
107
- ### Preprocessing
108
- - Tokenized with SentencePiece
109
- - Truncated to 2,048 tokens
110
- - Removed duplicates and low-quality examples
111
 
112
- ### Hyperparameters
113
- - **Precision:** bf16 mixed
114
- - **Batch size:** 16
115
- - **Learning rate:** 2e-5
116
- - **Training hardware:** 8 × A100 GPUs for ~4 hours
117
 
118
- ---
119
-
120
- ## Evaluation
121
-
122
- ### Test Data & Metrics
123
- - **Datasets:** SuperGLUE, Anthropic HH-RLHF style instruction set
124
- - **Metrics:** Perplexity, BLEU
125
-
126
- ### Results
127
- - **Perplexity:** 10.5 on held-out validation
128
- - **BLEU:** 23.7 average
129
-
130
- **Summary:** Performance matches the full-precision base; quantization adds <1 PPL point.
131
-
132
- ---
133
-
134
- ## Environmental Impact
135
-
136
- Estimated via the [ML CO₂ Impact Calculator](https://mlco2.github.io/impact#compute):
137
-
138
- - **Hardware:** 8 × NVIDIA A100
139
- - **Provider:** Google Cloud (us-central1)
140
- - **Training time:** ~4 hours
141
- - **Emissions:** ~150 kg CO₂ eq
142
-
143
- ---
144
-
145
- ## Technical Specifications
146
-
147
- - **Architecture:**
148
- 24-layer, 2.61 B-parameter decoder-only Transformer
149
- - Hidden size: 2,048
150
- - Attention heads: 16
151
- - **Software:**
152
- - transformers ≥ 4.x
153
- - bitsandbytes ≥ 0.39
154
- - torch ≥ 2.x
155
- - **Inference HW:** NVIDIA V100/A100
156
-
157
- ---
158
-
159
- ## Citation
160
-
161
- ```bibtex
162
- @misc{rishi-2-2b-it,
163
- title = {rishi-2-2B-IT: A 4-bit Quantized Instruction-Tuned Variant of Gemma-2},
164
- author = {Google Research and korarishi1027},
165
- year = {2024},
166
- howpublished = {\url{https://huggingface.co/koraishi1027/kora-2-2b-it}}
167
- }
 
1
  ---
2
  library_name: transformers
3
  tags:
4
+ - text-generation
5
+ - conversational
6
+ - instruction-tuned
7
+ - 4-bit precision
8
+ - bitsandbytes
 
 
 
 
 
9
  ---
10
 
11
+ # Rishi-2-2B-IT
12
 
13
  **Model ID:** `korarishi1027/rishi-2-2b-it`
14
 
15
+ ## Model Information
16
+ Summary description and brief definition of inputs and outputs.
17
 
18
+ ## Description
19
+ The text-to-text, decoder-only large language model, available in English, with open weights for both pre-trained and instruction-tuned variants. Rishi-2-2B-IT is suited for a variety of text generation tasks, including question answering, summarization, and reasoning. Its compact size allows deployment on limited-resource environments such as laptops, desktops, or private cloud infrastructure, democratizing access to state-of-the-art AI models.
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
+ ## Running with the pipeline API
22
  ```python
23
+ import torch
24
+ from transformers import pipeline
25
 
26
+ pipe = pipeline(
27
+ "text-generation",
28
+ model="korarishi1027/rishi-2-2b-it",
29
+ model_kwargs={"torch_dtype": torch.bfloat16},
30
+ device="cuda", # replace with "mps" to run on a Mac device
31
  )
32
 
33
+ messages = [
34
+ {"role": "user", "content": "Who are you? Please, answer in pirate-speak."},
35
+ ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
+ outputs = pipe(messages, max_new_tokens=256)
38
+ assistant_response = outputs[0]["generated_text"][-1]["content"].strip()
39
+ print(assistant_response)
40
+ ```
41
 
42
+ ## Running on single / multi GPU
43
+ ```bash
44
+ # pip install accelerate
45
+ ```
46
  ```python
 
47
  from transformers import AutoTokenizer, AutoModelForCausalLM
48
+ import torch
 
 
 
 
 
 
 
49
 
50
  tokenizer = AutoTokenizer.from_pretrained("korarishi1027/rishi-2-2b-it")
51
  model = AutoModelForCausalLM.from_pretrained(
52
  "korarishi1027/rishi-2-2b-it",
53
+ device_map="auto",
54
+ torch_dtype=torch.bfloat16,
55
  )
56
 
57
+ input_text = "Write me a poem about Machine Learning."
58
+ input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
+ outputs = model.generate(**input_ids, max_new_tokens=32)
61
+ print(tokenizer.decode(outputs[0]))
62
+ ```
 
 
63
 
64
+ ## Chat template usage
65
+ ```python
66
+ messages = [
67
+ {"role": "user", "content": "Write me a poem about Cars."},
68
+ ]
69
+ input_ids = tokenizer.apply_chat_template(
70
+ messages, return_tensors="pt", return_dict=True
71
+ ).to("cuda")
72
+
73
+ outputs = model.generate(**input_ids, max_new_tokens=256)
74
+ print(tokenizer.decode(outputs[0]))
75
+ ```
76
+
77
+ ## Developed by
78
+ [korarishi1027](https://huggingface.co/korarishi1027)