|
--- |
|
{} |
|
--- |
|
|
|
# LLAMA3.2 Nepali 318M Model |
|
|
|
## Overview |
|
This is a 318M parameter LLAMA3.2 model fine-tuned on a Nepali text dataset. The model is designed for generating coherent and contextually relevant Nepali text. |
|
|
|
## Resources |
|
- **Training Code:** [GitHub Repository](https://github.com/Aananda-giri/LLAMA3-Nepali) |
|
- **Chat Interface:** [Hugging Face Space](https://huggingface.co/spaces/Aananda-giri/LLAMA3_Nepali_318M) |
|
- **Dataset:** [IRIISNEPAL/Nepali-Text-Corpus](https://huggingface.co/datasets/IRIISNEPAL/Nepali-Text-Corpus) and [nepberta](https://nepberta.github.io/) |
|
- **Reference Book:** *[Build a Large Language Model (From Scratch)](https://www.manning.com/books/build-a-large-language-model-from-scratch)* by Sebastian Raschka, PhD |
|
|
|
## Installation |
|
To install the required dependencies, run: |
|
```sh |
|
pip install datasets huggingface_hub matplotlib transformers torch --quiet |
|
``` |
|
|
|
## Usage |
|
### 1. Download Model Weights |
|
```python |
|
from huggingface_hub import hf_hub_download |
|
hf_hub_download(repo_id="Aananda-giri/LLAMA3-Nepali", filename="parameters_300m/model_pg_398000_steps.pth", local_dir="./") |
|
``` |
|
|
|
### 2. Load the Tokenizer |
|
```python |
|
from transformers import PreTrainedTokenizerFast |
|
|
|
tokenizer = PreTrainedTokenizerFast.from_pretrained("Aananda-giri/LLAMA3-Nepali") |
|
tokenizer.save_pretrained("NepaliBPE") |
|
``` |
|
|
|
### 3. Download Additional Scripts |
|
```python |
|
import requests |
|
res=requests.get(r"https://raw.githubusercontent.com/Aananda-giri/LLAMA3-Nepali/main/4.%20inference/2_inference/previous_chapters.py") |
|
with open('previous_chapters.py', 'w') as f: |
|
f.write(res.text) |
|
``` |
|
|
|
### 4. Load the Model |
|
```python |
|
import torch |
|
from previous_chapters import Llama3Model, ChatFormat, Tokenizer, generate_and_print_sample |
|
|
|
# Initialize tokenizer |
|
_tokenizer = Tokenizer("NepaliBPE/tokenizer.json") |
|
chat_tokenizer = ChatFormat(_tokenizer) |
|
|
|
# Define model configuration |
|
LLAMA32_CONFIG = { |
|
"vocab_size": 50006, |
|
"context_length": 512, |
|
"emb_dim": 1320, |
|
"n_heads": 20, |
|
"n_layers": 10, |
|
"hidden_dim": 5280, |
|
"n_kv_groups": 5, |
|
"rope_base": 500_000.0, |
|
"dtype": torch.bfloat16, |
|
"rope_freq": { |
|
"factor": 32.0, |
|
"low_freq_factor": 1.0, |
|
"high_freq_factor": 4.0, |
|
"original_context_length": 8192, |
|
} |
|
} |
|
|
|
# Adjust RoPE Scaling |
|
old_context_length = 131_072 |
|
new_context_length = LLAMA32_CONFIG["context_length"] |
|
LLAMA32_CONFIG["rope_base"] *= new_context_length / old_context_length |
|
|
|
# Load Model |
|
model = Llama3Model(LLAMA32_CONFIG) |
|
model.eval() |
|
|
|
# Optimize model if PyTorch 2.0 is available |
|
if torch.__version__ >= "2.0": |
|
model = torch.compile(model) |
|
``` |
|
|
|
### 5. Load Model Weights |
|
```python |
|
# Move model to device |
|
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") |
|
model.to(device) |
|
print(f'device: {device}') |
|
|
|
# Load checkpoint |
|
latest_model_checkpoint = "parameters_300m/model_pg_398000_steps.pth" |
|
checkpoint = torch.load(latest_model_checkpoint, map_location=device, weights_only=False) |
|
model.load_state_dict(checkpoint["model_state_dict"]) |
|
``` |
|
|
|
### 6. Generate Text |
|
```python |
|
# Generate text sample |
|
generate_and_print_sample( |
|
PROMPT="रामले भात", |
|
tokenizer=_tokenizer, |
|
chat_tokenizer=chat_tokenizer, |
|
model=model, |
|
device=device, |
|
context_length=LLAMA32_CONFIG["context_length"] |
|
) |
|
``` |
|
|
|
#### Advanced Text Generation |
|
```python |
|
from previous_chapters import generate_chat_optimized |
|
import time |
|
|
|
start_time = time.time() |
|
output_text = generate_chat_optimized( |
|
prompt="रामले भात", |
|
tokenizer=tokenizer, |
|
chat_tokenizer=chat_tokenizer, |
|
model=model, |
|
max_new_tokens=20, |
|
context_size=512, |
|
device=device, |
|
temperature=0.3, |
|
top_k=5, |
|
top_p=None, |
|
eos_id=None, |
|
repetition_penalty=1.2, |
|
penalize_len_below=10, |
|
batch_size=1 # Added parameter |
|
) |
|
|
|
print(f"time:{time.time() - start_time}\n output_text: {output_text}") |
|
``` |
|
|
|
|
|
# Model Checkpoints |
|
The best-performing checkpoint is **parameters_300m/model_pg_398000_steps.pth**. Additionally, other folders contain experimental checkpoints from various training runs. |
|
|
|
|