upperwal commited on
Commit
56a19e2
·
verified ·
1 Parent(s): c22af26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -136
README.md CHANGED
@@ -39,6 +39,7 @@ inference:
39
 
40
 
41
  ## Architecture Overview
 
42
  Pragna-1B is a decoder-only transformer model inspired by TinyLlama, featuring the following specifications:
43
 
44
  Layers: 22
@@ -57,43 +58,9 @@ Pragna-1B is trained on our proprietary platform, GenAI Studio, a modular AI Dev
57
 
58
 
59
 
60
- - **Developed by:** [More Information Needed]
61
- - **Funded by [optional]:** [More Information Needed]
62
- - **Shared by [optional]:** [More Information Needed]
63
- - **Model type:** [More Information Needed]
64
- - **Language(s) (NLP):** [More Information Needed]
65
- - **License:** [More Information Needed]
66
- - **Finetuned from model [optional]:** [More Information Needed]
67
-
68
- ### Model Sources [optional]
69
-
70
- <!-- Provide the basic links for the model. -->
71
-
72
- - **Repository:** [More Information Needed]
73
- - **Paper [optional]:** [More Information Needed]
74
- - **Demo [optional]:** [More Information Needed]
75
-
76
- ## Uses
77
-
78
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
79
-
80
- ### Direct Use
81
-
82
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
83
-
84
- [More Information Needed]
85
-
86
- ### Downstream Use [optional]
87
-
88
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
89
-
90
- [More Information Needed]
91
-
92
- ### Out-of-Scope Use
93
-
94
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
95
-
96
- [More Information Needed]
97
 
98
  ## Bias, Risks, and Limitations
99
 
@@ -101,12 +68,6 @@ Pragna-1B is trained on our proprietary platform, GenAI Studio, a modular AI Dev
101
 
102
  [More Information Needed]
103
 
104
- ### Recommendations
105
-
106
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
107
-
108
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
109
-
110
  ## How to Get Started with the Model
111
 
112
  Use the code below to get started with the model.
@@ -117,123 +78,78 @@ Use the code below to get started with the model.
117
 
118
  ### Training Data
119
 
120
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
121
-
122
- [More Information Needed]
123
 
124
  ### Training Procedure
125
 
126
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
127
-
128
- #### Preprocessing [optional]
129
-
130
- [More Information Needed]
131
 
132
 
133
  #### Training Hyperparameters
134
 
135
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
136
-
137
- #### Speeds, Sizes, Times [optional]
138
-
139
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
140
-
141
- [More Information Needed]
142
 
143
  ## Evaluation
144
 
145
- <!-- This section describes the evaluation protocols and provides the results. -->
 
 
 
 
 
 
 
146
 
147
- ### Testing Data, Factors & Metrics
 
 
 
 
148
 
149
- #### Testing Data
150
 
151
- <!-- This should link to a Dataset Card if possible. -->
152
 
153
- [More Information Needed]
154
-
155
- #### Factors
156
-
157
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
158
-
159
- [More Information Needed]
160
 
161
- #### Metrics
162
-
163
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
164
-
165
- [More Information Needed]
166
 
167
  ### Results
168
 
169
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
170
 
171
- #### Summary
172
-
173
-
174
-
175
- ## Model Examination [optional]
176
-
177
- <!-- Relevant interpretability work for the model goes here -->
178
-
179
- [More Information Needed]
180
-
181
- ## Environmental Impact
182
-
183
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
184
-
185
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
186
-
187
- - **Hardware Type:** [More Information Needed]
188
- - **Hours used:** [More Information Needed]
189
- - **Cloud Provider:** [More Information Needed]
190
- - **Compute Region:** [More Information Needed]
191
- - **Carbon Emitted:** [More Information Needed]
192
-
193
- ## Technical Specifications [optional]
194
-
195
- ### Model Architecture and Objective
196
-
197
- [More Information Needed]
198
-
199
- ### Compute Infrastructure
200
-
201
- [More Information Needed]
202
-
203
- #### Hardware
204
-
205
- [More Information Needed]
206
-
207
- #### Software
208
-
209
- [More Information Needed]
210
 
211
  ## Citation [optional]
212
 
213
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
214
 
215
- **BibTeX:**
216
-
217
- [More Information Needed]
218
-
219
- **APA:**
220
-
221
- [More Information Needed]
222
-
223
- ## Glossary [optional]
224
-
225
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
226
-
227
- [More Information Needed]
228
-
229
- ## More Information [optional]
230
-
231
- [More Information Needed]
232
-
233
- ## Model Card Authors [optional]
234
-
235
- [More Information Needed]
236
 
237
  ## Model Card Contact
238
 
239
- [More Information Needed]
 
39
 
40
 
41
  ## Architecture Overview
42
+
43
  Pragna-1B is a decoder-only transformer model inspired by TinyLlama, featuring the following specifications:
44
 
45
  Layers: 22
 
58
 
59
 
60
 
61
+ - **Developed by:** [Soket AI Labs](http://soket.ai)
62
+ - **Language(s) (NLP):** Hindi, Bangla, Gujarati and English
63
+ - **License:** Apache 2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
 
65
  ## Bias, Risks, and Limitations
66
 
 
68
 
69
  [More Information Needed]
70
 
 
 
 
 
 
 
71
  ## How to Get Started with the Model
72
 
73
  Use the code below to get started with the model.
 
78
 
79
  ### Training Data
80
 
81
+ 1. [Bhasha-wiki](https://soket.ai/blogs/bhasha_wiki_dataset)
82
+ 2. [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
83
+ 3. [Sangraha-Verified](https://huggingface.co/datasets/ai4bharat/sangraha)
84
 
85
  ### Training Procedure
86
 
87
+ [To be added]
 
 
 
 
88
 
89
 
90
  #### Training Hyperparameters
91
 
92
+ - **Precision:** BFloat16
93
+ - **Batch Size:** 2k - 2.5k
94
+ - **Context Length:** 2,048
95
+ - **Learning Rate:** 3e-5
96
+ - **Optimizer:** AdamW
97
+ - **LR Scheduler:** Cosine
98
+ - **Mixed Precision Training**
99
 
100
  ## Evaluation
101
 
102
+ ### Hindi
103
+ | | Arc-Easy | Arc-Challenge | Hellaswag | Average |
104
+ |--------------|----------|---------------|-----------|---------|
105
+ | pragna-1b | 0.33 | 0.22 | 0.35 | 0.30 |
106
+ | sarvamai/OpenHathi-7B-Hi-v0.1-Base | 0.3582 | 0.2645 | 0.4315 | 0.35 |
107
+ | meta-llama/Llama-2-7b-hf | 0.295 | 0.2406 | 0.3789 | 0.30 |
108
+ | google/gemma-7b | <b>0.5926</b> | <b>0.4258</b> | <b>0.6341</b> | <b>0.55</b> |
109
+ | meta-llama/Meta-Llama-3-8B | 0.5354 | 0.3541 | 0.6072 | 0.50 |
110
 
111
+ ### Gujarati
112
+ | | Arc-Easy | Arc-Challenge | Hellaswag | Average |
113
+ |--------------|----------|---------------|-----------|---------|
114
+ | pragna-1b | 0.32 | 0.22 | 0.37 | 0.30 |
115
+ | google/gemma-7b | <b>0.4954</b> | <b>0.3208</b> | <b>0.5673</b> | <b>0.46</b> |
116
 
 
117
 
118
+ ### English
119
 
120
+ | | Arc | Hellaswag | MMLU | TruthfulQA | Winogrande | GSM8K | Average |
121
+ |----------------|-----|-----------|------|------------|------------|-------|---------|
122
+ | pragna-1b | 0.3 | 0.51 | 0.27 | 0.38 | 0.56 | 0 | 0.34 |
123
+ | TinyLlama/TinyLlama-1.1B-Chat-v1.0 | 0.36 | 0.61 | 0.25 | 0.37 | 0.61 | 0.02 | 0.37 |
124
+ | meta-llama/Meta-Llama-3-8B-Instruct | <b>0.6</b> | <b>0.82</b> | <b>0.67</b> | <b>0.51</b> | <b>0.77</b> | <b>0.68</b> | <b>0.66</b> |
125
+ | meta-llama/Llama-2-7b-hf | 0.53 | 0.78 | 0.46 | 0.39 | 0.74 | 0.14 | 0.51 |
 
126
 
 
 
 
 
 
127
 
128
  ### Results
129
 
130
+ | Context | Tokens Predicted |
131
+ |---------|------------------|
132
+ | India is one of the most | India is one of the most exciting places to travel to in Asia. The country hosts a wide variety of cultures and offers a plethora of attractions across the country. There's a lot you can do here and the tourists are always on the lookout for the new things to do and see. If you're looking to travel to India for your next adventure, you'll be pleased to know that you can find a ton of things to do in the country. From the temples to the beaches, there's a lot you can see and do here. One of the best things for visitors to do in India is to visit Jaipur. The capital city of Rajputana, Jaipur is a beautiful city that's known as the "City of Lives". |
133
+ | नरेंद्र मोदी का जन्म | नरेंद्र मोदी का जन्मदिन आज मनाया जा रहा है। इस मौके पर प्रधानमंत्री नरेंद्र मोदी ने ट्वीट कर लिखा कि 'आज देश को एक नया संसद भवन मिला है, जिसे आप लोगों ने बधाई दी है। भारत और भारतवासियों को यह संसद बहुत ही महत्त्वपूर्ण है और इसका उपयोग देश के विकास में बहुत अधिक योगदान देगा। आज हमारे देश में एक संसद बनी हुई है जो हमारे देश की लोकतांत्रिक प्रक्रियाओं को मजबूत बनाती है। |
134
+ | Prime minister Modi | Prime minister Modi's visit to China: PM Modi is scheduled to arrive in Beijing on Thursday. PM Narendra Modi will be the first Indian prime minister to visit China since former President Mao Zedong, who visited India in 1959 and was assassinated by Chinese soldiers a year later. The two leaders are expected to discuss regional security issues such as terrorism, trade and energy cooperation during their talks. |
135
+ | महात्मा गांधी | महात्मा गांधी एक स्वतंत्र व्यक्ति, एक आत्मनिर्भर और एक अत्यंत लोकप्रिय नेता थे। उन्होंने अपने जीवनकाल में देश को एक नई दिशा देने के लिए बहुत कुछ किया। समाज के विभिन्न क्षेत्रों में उनके योगदान के लिए उन्हें सम्मानित किया गया था। |
136
+ | ब्रिटिश राजशाही | ब्रिटिश राजशाही के विघटन के बाद, इंग्लैंड और वेल्स में ब्रिटेन की संसद का गठन हुआ। संसद ने 1807 में अपने पहले अधिनियम, "द क्राउन की विधेयक" (एक्ट ऑफ द डेमोक्रैटिक) को मंजूरी दी, जिसमें प्रत्येक राज्य से एक निर्वाचित सदस्य चुना गया। |
137
+ | মহাত্মা গান্ধী | মহাত্মা গান্ধী ছিলেন আন্তর্জাতিক সমাজতান্ত্রিক আন্দোলনের অন্যতম প্রধান নেতা। তিনি ভারতের স্বাধীনতা সংগ্রামে তাঁর ভূমিকা অপরিসীম। ১৯৪৭ সালের ১৫ আগস্ট ভারতে বিপ্লব ঘটে। ১৯৪৮ সালে মহাত্মা গান্ধীর মৃত্যু হয়। এরপর থেকে মহাত্মা আর দেখা যায় না। |
138
+ | নরেন্দ্র মোদীর জন্ম | নরেন্দ্র মোদীর জন্মদিনের শুভেচ্ছা জানিয়েছেন অমিতাভ বচ্চন। প্রসঙ্গত, গোটা দেশ জুড়ে চলছে নববর্ষ উৎসব। আর এই উৎসবে নিজেকে সামিল করতে চান না কেউই। এমনটাই জানালেন বলিউড অভিনেত্রী সারা আলি খ���ন। |
139
+ | એક વખત | ક વખતે, તેમણે તેમના મિત્ર અને સહાયક વિલિયમ હેનરી મેકડોનાલ્ડ સાથે મળીને "ધ ઓરિજિનલ મેન ઓફ ધ ડેડ" નામના પુસ્તકનું નિર્માણ કર્યું હતું; તે 1961માં પ્રકાશિત થયું હતું. 1958માં, તેમણે ધ ન્યૂ યોર્કર માટે લેખક અને સંપાદક પિતા તરીકે કામ કર્યું હતું. |
140
+ | એક સમયે એક રાજા જંગલમાં રહેતો હતો. તેમણે | એક સમયે એક રાજા જંગલમાં રહેતો હતો. તેમણે, એક દિવસ રાજાને કહ્યું, "હું તમને મારી જાતને એક સ્ત્રી તરીકે આપું છું." રાજા ખુશ થયો. તેમણે તેને તેના ઘરમાં લઈ ગયા. તે દિવસે રાણી પોતાના પતિ સાથે ઘરે આવી અને કહ્યું કે તે તેની પાસે છે. રાજાએ તેને પૂછ્યું કે શું તેણે તેની સાથે લગ્ન કર્યા છે ? રાજા અને રાણીએ પૂછ્યું - હા. ત્યારે તેમણે કહ્યું કે, "તમે મને તમારી સાથે લગ્ન કરવાનું વચન આપ્યું છે. હવે મારે તમારા પર વિશ્વાસ રાખવો જોઈએ નહીં". |
141
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
142
 
143
  ## Citation [optional]
144
 
145
  <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
146
 
147
+ @ONLINE{bhasha-wiki,
148
+ author = "Soket Labs Technology and Research Private Limited",
149
+ title = "pragna-1b",
150
+ url = "https://soket.ai"
151
+ }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  ## Model Card Contact
154
 
155