Aananda-giri
/

GPT2-Nepali

Text Generation

Safetensors

gpt2

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

Aananda-giri commited on Mar 15

Commit

cbb883f

verified ·

1 Parent(s): 9273746

Update README.md

Browse files

Files changed (1) hide show

README.md +96 -6

README.md CHANGED Viewed

@@ -5,11 +5,101 @@ tags:
 - pytorch_model_hub_mixin
 ---
-# GPT2 Nepali 124M model.
-This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
-- Library: https://huggingface.co/Aananda-giri/GPT2-Nepali/
-- Docs: [More Information Needed]
-* [Code (github)](https://github.com/Aananda-giri/GPT2-Nepali)
-* [chat interface (huggingface-space)](https://huggingface.co/spaces/Aananda-giri/gpt2-nepali)

 - pytorch_model_hub_mixin
 ---
+# GPT2 Nepali 124M Base Model.
+Welcome to the **GPT2-Nepali** repository! This project features a GPT-2 model trained from scratch on a 12GB Nepali text dataset derived from the [NepBERTa project](https://nepberta.github.io). The model is specifically tailored for the Nepali language and includes a user-friendly chat interface hosted on Hugging Face Spaces.
+---
+## Project Highlights
+- **Chat Interface:**
+  [Hugging-Face-Space](https://huggingface.co/spaces/Aananda-giri/gpt2-nepali)
+- **Training Code:**
+  [GitHub Repository](https://github.com/Aananda-giri/GPT2-Nepali)
+- **Dataset:**
+  12GB Nepali text extracted from the [NepBERTa project](https://nepberta.github.io)
+---
+## Overview
+**GPT2-Nepali** adapts the GPT-2 model training process (inspired by the resource [Build a Large Language Model (From Scratch)](https://www.manning.com/books/build-a-large-language-model-from-scratch)) to address the nuances of the Nepali language. Key modifications include the development of a dedicated BPE tokenizer for Nepali and adjustments to the dataloader to better handle pre-tokenized datasets.
+---
+## Installation
+* Clone the repository and install the required dependencies:
+```bash
+git clone https://github.com/Aananda-giri/GPT2-Nepali.git
+cd GPT2-Nepali
+pip install -r requirements.txt
+```
+* download `gpt_model_code.py`
+```python
+import requests
+res=requests.get(r"https://raw.githubusercontent.com/Aananda-giri/GPT2-Nepali/main/3.%20GPT2-Nepali/2_inference/gpt_model_code.py")
+with open('gpt_model_code.py','w') as f:
+    f.write(res.text)
+```
+---
+## Quick Start
+Below is a sample script to load the model and generate text:
+```python
+from transformers import PreTrainedTokenizerFast
+import torch
+from gpt_model_code import GPTModel, GPT_CONFIG_124M, generate  # Use model_code if applicable
+# Determine the device
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+# Initialize and load the model
+model = GPTModel(GPT_CONFIG_124M)
+model.to(device)
+# Load the pre-trained model from Hugging Face
+model = GPTModel.from_pretrained("Aananda-giri/GPT2-Nepali")
+model.to(device)
+# Load the tokenizer
+tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/GPT2-Nepali")
+# Generate sample text
+prompt = "रामले भात"
+generated_text = generate(
+    model,
+    prompt,
+    tokenizer,
+    max_new_tokens=100,
+    temperature=0.7,
+    top_k=50,
+    top_p=None,  # Use nucleus sampling if needed
+    eos_id=None,
+    repetition_penalty=1.2,
+    penalize_len_below=50
+)
+print(generated_text)
+```
+---
+## Acknowledgments
+A special thank you to [@rasbt](https://twitter.com/rasbt) for the inspiration and for authoring *Build a Large Language Model (From Scratch)*—one of the best resources on LLMs available!
+---
+---
+Happy-Coding!
+---