GPT2 Nepali 124M Base Model.

Welcome to the GPT2-Nepali repository! This project features a GPT-2 model trained from scratch on a 12GB Nepali text dataset derived from the NepBERTa project. The model is specifically tailored for the Nepali language and includes a user-friendly chat interface hosted on Hugging Face Spaces.


Project Highlights


Overview

GPT2-Nepali adapts the GPT-2 model training process (inspired by the resource Build a Large Language Model (From Scratch)) to address the nuances of the Nepali language. Key modifications include the development of a dedicated BPE tokenizer for Nepali and adjustments to the dataloader to better handle pre-tokenized datasets.


Installation

  • Clone the repository and install the required dependencies:
git clone https://github.com/Aananda-giri/GPT2-Nepali.git
cd GPT2-Nepali
pip install -r requirements.txt
  • download gpt_model_code.py
import requests
res=requests.get(r"https://raw.githubusercontent.com/Aananda-giri/GPT2-Nepali/main/3.%20GPT2-Nepali/2_inference/gpt_model_code.py")
with open('gpt_model_code.py','w') as f:
    f.write(res.text)

Quick Start

Below is a sample script to load the model and generate text:

from transformers import PreTrainedTokenizerFast
import torch
from gpt_model_code import GPTModel, GPT_CONFIG_124M, generate  # Use model_code if applicable

# Determine the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Initialize and load the model
model = GPTModel(GPT_CONFIG_124M)
model.to(device)

# Load the pre-trained model from Hugging Face
model = GPTModel.from_pretrained("Aananda-giri/GPT2-Nepali")
model.to(device)

# Load the tokenizer
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/GPT2-Nepali")

# Generate sample text
prompt = "रामले भात"
generated_text = generate(
    model,
    prompt,
    tokenizer,
    max_new_tokens=100,
    temperature=0.7,
    top_k=50,
    top_p=None,  # Use nucleus sampling if needed
    eos_id=None,
    repetition_penalty=1.2,
    penalize_len_below=50
)

print(generated_text)

Acknowledgments

A special thank you to @rasbt for the inspiration and for authoring *Build a Large Language Model (From Scratch)*—one of the best resources on LLMs available!



Happy-Coding!


Downloads last month
31
Safetensors
Model size
165M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Aananda-giri/GPT2-Nepali 1