---
license: cc-by-nc-4.0
datasets:
- PapaRazi/id-tts-v2
language:
- id
base_model:
- SWivid/F5-TTS
pipeline_tag: text-to-speech
tags:
- tts
- F5-TTS
- text-to-speech
---
# PapaRazi/Ijazah_Palsu_V2 · 🇮🇩 Indonesian TTS Model (F5-TTS)

**Ijazah_Palsu_V2** is a fine-tuned Indonesian speech synthesis model based on [F5-TTS](https://github.com/F5soh/F5-TTS).  
It was trained using a custom-curated dataset called [`PapaRazi/id-tts-v2`](https://github.com/adigayung/whisper-tools), focusing on natural and expressive Indonesian speech generation.

---

## 🧠 Model Details

- **Base Framework:** F5-TTS
- **Training Time:** ~3 days
- **Dataset Size:** ~70,000 samples (70 hours)
- **Languages:**  
  - **Bahasa Indonesia** (95%)  
  - **English** (5%) *(limited English quality due to small dataset size)*
- **License:** Non-commercial use only
- **Author:** [PapaRazi] (https://huggingface.co/PapaRazi) / (https://github.com/adigayung)

---

## 🛠 Training Configuration

```json
{
  "exp_name": "F5TTS_v1_Base",
  "learning_rate": 1e-05,
  "batch_size_per_gpu": 1700,
  "batch_size_type": "frame",
  "max_samples": 64,
  "grad_accumulation_steps": 1,
  "max_grad_norm": 1,
  "epochs": 34,
  "num_warmup_updates": 7000,
  "save_per_updates": 15000,
  "keep_last_n_checkpoints": 7,
  "last_per_updates": 15000,
  "finetune": true,
  "file_checkpoint_train": "",
  "tokenizer_type": "char",
  "tokenizer_file": "",
  "mixed_precision": "fp16",
  "logger": "tensorboard",
  "bnb_optimizer": false
}
```
📦 Dataset
The dataset used for training is called PapaRazi/id-tts-v2, consisting of curated and cleaned audio-text pairs in Bahasa Indonesia.
All preprocessing, splitting, and cleaning was done using a custom tool I developed:
🔧 whisper-tools

The default dataset splitter from F5-TTS produced inconsistent results (clips that were too short or way too long), so I built a custom pipeline to ensure clean, consistent samples.

## 🔊 Audio Samples

### 🗣 Natural Sentence
> *"Suatu hari nanti, suara ini mungkin tidak bisa dibedakan lagi dari suara manusia asli."*  
🎧 [Listen on vocaroo](https://voca.ro/18y7FTzxcbta)

---

### 🔢 Number Pronunciation (simple format)
> *"Serius?! Tiket konsernya habis dalam waktu 3 menit?!"*  
🎧 [Listen on vocaroo](https://voca.ro/19daRAoMs0oD)

---

### 💸 Number Hallucination (millions format – still imperfect)
> *"Masa cuma buat beli kursi kantor aja harus bayar Rp 2.500.000,-?! Gila sih itu!"*  
🎧 [Listen on vocaroo](https://voca.ro/1nbtoyUOGWJP)
> ⚠️ Reading large numbers (like millions) is still inaccurate due to limited examples in the training dataset.

🤝 License & Usage
This model is released for non-commercial use only.
Feel free to explore, fine-tune, or give feedback!