Update README.md
Browse files
README.md
CHANGED
@@ -5,11 +5,101 @@ tags:
|
|
5 |
- pytorch_model_hub_mixin
|
6 |
---
|
7 |
|
8 |
-
# GPT2 Nepali 124M
|
9 |
|
10 |
-
This model
|
11 |
-
- Library: https://huggingface.co/Aananda-giri/GPT2-Nepali/
|
12 |
-
- Docs: [More Information Needed]
|
13 |
|
14 |
-
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
- pytorch_model_hub_mixin
|
6 |
---
|
7 |
|
8 |
+
# GPT2 Nepali 124M Base Model.
|
9 |
|
10 |
+
Welcome to the **GPT2-Nepali** repository! This project features a GPT-2 model trained from scratch on a 12GB Nepali text dataset derived from the [NepBERTa project](https://nepberta.github.io). The model is specifically tailored for the Nepali language and includes a user-friendly chat interface hosted on Hugging Face Spaces.
|
|
|
|
|
11 |
|
12 |
+
---
|
13 |
+
|
14 |
+
## Project Highlights
|
15 |
+
|
16 |
+
- **Chat Interface:**
|
17 |
+
[Hugging-Face-Space](https://huggingface.co/spaces/Aananda-giri/gpt2-nepali)
|
18 |
+
|
19 |
+
- **Training Code:**
|
20 |
+
[GitHub Repository](https://github.com/Aananda-giri/GPT2-Nepali)
|
21 |
+
|
22 |
+
- **Dataset:**
|
23 |
+
12GB Nepali text extracted from the [NepBERTa project](https://nepberta.github.io)
|
24 |
+
|
25 |
+
---
|
26 |
+
|
27 |
+
## Overview
|
28 |
+
|
29 |
+
**GPT2-Nepali** adapts the GPT-2 model training process (inspired by the resource [Build a Large Language Model (From Scratch)](https://www.manning.com/books/build-a-large-language-model-from-scratch)) to address the nuances of the Nepali language. Key modifications include the development of a dedicated BPE tokenizer for Nepali and adjustments to the dataloader to better handle pre-tokenized datasets.
|
30 |
+
|
31 |
+
---
|
32 |
+
|
33 |
+
## Installation
|
34 |
+
|
35 |
+
* Clone the repository and install the required dependencies:
|
36 |
+
|
37 |
+
```bash
|
38 |
+
git clone https://github.com/Aananda-giri/GPT2-Nepali.git
|
39 |
+
cd GPT2-Nepali
|
40 |
+
pip install -r requirements.txt
|
41 |
+
```
|
42 |
+
|
43 |
+
* download `gpt_model_code.py`
|
44 |
+
```python
|
45 |
+
import requests
|
46 |
+
res=requests.get(r"https://raw.githubusercontent.com/Aananda-giri/GPT2-Nepali/main/3.%20GPT2-Nepali/2_inference/gpt_model_code.py")
|
47 |
+
with open('gpt_model_code.py','w') as f:
|
48 |
+
f.write(res.text)
|
49 |
+
```
|
50 |
+
|
51 |
+
---
|
52 |
+
|
53 |
+
## Quick Start
|
54 |
+
|
55 |
+
Below is a sample script to load the model and generate text:
|
56 |
+
|
57 |
+
```python
|
58 |
+
from transformers import PreTrainedTokenizerFast
|
59 |
+
import torch
|
60 |
+
from gpt_model_code import GPTModel, GPT_CONFIG_124M, generate # Use model_code if applicable
|
61 |
+
|
62 |
+
# Determine the device
|
63 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
64 |
+
|
65 |
+
# Initialize and load the model
|
66 |
+
model = GPTModel(GPT_CONFIG_124M)
|
67 |
+
model.to(device)
|
68 |
+
|
69 |
+
# Load the pre-trained model from Hugging Face
|
70 |
+
model = GPTModel.from_pretrained("Aananda-giri/GPT2-Nepali")
|
71 |
+
model.to(device)
|
72 |
+
|
73 |
+
# Load the tokenizer
|
74 |
+
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/GPT2-Nepali")
|
75 |
+
|
76 |
+
# Generate sample text
|
77 |
+
prompt = "रामले भात"
|
78 |
+
generated_text = generate(
|
79 |
+
model,
|
80 |
+
prompt,
|
81 |
+
tokenizer,
|
82 |
+
max_new_tokens=100,
|
83 |
+
temperature=0.7,
|
84 |
+
top_k=50,
|
85 |
+
top_p=None, # Use nucleus sampling if needed
|
86 |
+
eos_id=None,
|
87 |
+
repetition_penalty=1.2,
|
88 |
+
penalize_len_below=50
|
89 |
+
)
|
90 |
+
|
91 |
+
print(generated_text)
|
92 |
+
```
|
93 |
+
|
94 |
+
---
|
95 |
+
|
96 |
+
## Acknowledgments
|
97 |
+
|
98 |
+
A special thank you to [@rasbt](https://twitter.com/rasbt) for the inspiration and for authoring *Build a Large Language Model (From Scratch)*—one of the best resources on LLMs available!
|
99 |
+
|
100 |
+
---
|
101 |
+
---
|
102 |
+
|
103 |
+
Happy-Coding!
|
104 |
+
|
105 |
+
---
|