BharatVLM commited on
Commit
3f3f955
·
verified ·
1 Parent(s): 3df5c20

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -18,7 +18,7 @@ model-index:
18
 
19
  # Assamese GPT-2 Model
20
 
21
- This is a GPT-2 language model trained from scratch on Assamese monolingual text, using data from **IndicCorpV2** and **OSCAR**. The model is developed for **educational and research purposes** to support natural language understanding and generation tasks in Assamese — a low-resource language.
22
 
23
  ## 📖 Model Description
24
 
@@ -45,7 +45,6 @@ The Assamese GPT-2 model is based on the standard GPT-2 decoder-only transformer
45
  The model was trained using Assamese monolingual data collected from:
46
 
47
  - **IndicCorpV2**: A curated collection of web-crawled and processed data for Indic languages.
48
- - **OSCAR (Open Super-large Crawled ALMAnaCH coRpus)**: Filtered web-crawled corpus available through Hugging Face datasets.
49
 
50
  Data preprocessing included:
51
  - Unicode normalization
 
18
 
19
  # Assamese GPT-2 Model
20
 
21
+ This is a GPT-2 language model trained from scratch on Assamese monolingual text, using data from **IndicCorpV2** . The model is developed for **educational and research purposes** to support natural language understanding and generation tasks in Assamese — a low-resource language.
22
 
23
  ## 📖 Model Description
24
 
 
45
  The model was trained using Assamese monolingual data collected from:
46
 
47
  - **IndicCorpV2**: A curated collection of web-crawled and processed data for Indic languages.
 
48
 
49
  Data preprocessing included:
50
  - Unicode normalization