GPT2 Nepali 124M Base Model.
Welcome to the GPT2-Nepali repository! This project features a GPT-2 model trained from scratch on a 12GB Nepali text dataset derived from the NepBERTa project. The model is specifically tailored for the Nepali language and includes a user-friendly chat interface hosted on Hugging Face Spaces.
Project Highlights
Chat Interface:
Hugging-Face-SpaceTraining Code:
GitHub RepositoryDataset:
12GB Nepali text extracted from the NepBERTa project
Overview
GPT2-Nepali adapts the GPT-2 model training process (inspired by the resource Build a Large Language Model (From Scratch)) to address the nuances of the Nepali language. Key modifications include the development of a dedicated BPE tokenizer for Nepali and adjustments to the dataloader to better handle pre-tokenized datasets.
Installation
- Clone the repository and install the required dependencies:
git clone https://github.com/Aananda-giri/GPT2-Nepali.git
cd GPT2-Nepali
pip install -r requirements.txt
- download
gpt_model_code.py
import requests
res=requests.get(r"https://raw.githubusercontent.com/Aananda-giri/GPT2-Nepali/main/3.%20GPT2-Nepali/2_inference/gpt_model_code.py")
with open('gpt_model_code.py','w') as f:
f.write(res.text)
Quick Start
Below is a sample script to load the model and generate text:
from transformers import PreTrainedTokenizerFast
import torch
from gpt_model_code import GPTModel, GPT_CONFIG_124M, generate # Use model_code if applicable
# Determine the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Initialize and load the model
model = GPTModel(GPT_CONFIG_124M)
model.to(device)
# Load the pre-trained model from Hugging Face
model = GPTModel.from_pretrained("Aananda-giri/GPT2-Nepali")
model.to(device)
# Load the tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/GPT2-Nepali")
# Generate sample text
prompt = "रामले भात"
generated_text = generate(
model,
prompt,
tokenizer,
max_new_tokens=100,
temperature=0.7,
top_k=50,
top_p=None, # Use nucleus sampling if needed
eos_id=None,
repetition_penalty=1.2,
penalize_len_below=50
)
print(generated_text)
Acknowledgments
A special thank you to @rasbt for the inspiration and for authoring *Build a Large Language Model (From Scratch)*—one of the best resources on LLMs available!
Happy-Coding!
- Downloads last month
- 31