--- license: cc-by-nc-4.0 datasets: - PapaRazi/id-tts-v2 language: - id base_model: - SWivid/F5-TTS pipeline_tag: text-to-speech tags: - tts - F5-TTS - text-to-speech --- # PapaRazi/Ijazah_Palsu_V2 ยท ๐Ÿ‡ฎ๐Ÿ‡ฉ Indonesian TTS Model (F5-TTS) **Ijazah_Palsu_V2** is a fine-tuned Indonesian speech synthesis model based on [F5-TTS](https://github.com/F5soh/F5-TTS). It was trained using a custom-curated dataset called [`PapaRazi/id-tts-v2`](https://github.com/adigayung/whisper-tools), focusing on natural and expressive Indonesian speech generation. --- ## ๐Ÿง  Model Details - **Base Framework:** F5-TTS - **Training Time:** ~3 days - **Dataset Size:** ~70,000 samples (70 hours) - **Languages:** - **Bahasa Indonesia** (95%) - **English** (5%) *(limited English quality due to small dataset size)* - **License:** Non-commercial use only - **Author:** [PapaRazi] (https://huggingface.co/PapaRazi) / (https://github.com/adigayung) --- ## ๐Ÿ›  Training Configuration ```json { "exp_name": "F5TTS_v1_Base", "learning_rate": 1e-05, "batch_size_per_gpu": 1700, "batch_size_type": "frame", "max_samples": 64, "grad_accumulation_steps": 1, "max_grad_norm": 1, "epochs": 34, "num_warmup_updates": 7000, "save_per_updates": 15000, "keep_last_n_checkpoints": 7, "last_per_updates": 15000, "finetune": true, "file_checkpoint_train": "", "tokenizer_type": "char", "tokenizer_file": "", "mixed_precision": "fp16", "logger": "tensorboard", "bnb_optimizer": false } ``` ๐Ÿ“ฆ Dataset The dataset used for training is called PapaRazi/id-tts-v2, consisting of curated and cleaned audio-text pairs in Bahasa Indonesia. All preprocessing, splitting, and cleaning was done using a custom tool I developed: ๐Ÿ”ง whisper-tools The default dataset splitter from F5-TTS produced inconsistent results (clips that were too short or way too long), so I built a custom pipeline to ensure clean, consistent samples. ## ๐Ÿ”Š Audio Samples ### ๐Ÿ—ฃ Natural Sentence > *"Suatu hari nanti, suara ini mungkin tidak bisa dibedakan lagi dari suara manusia asli."* ๐ŸŽง [Listen on vocaroo](https://voca.ro/18y7FTzxcbta) --- ### ๐Ÿ”ข Number Pronunciation (simple format) > *"Serius?! Tiket konsernya habis dalam waktu 3 menit?!"* ๐ŸŽง [Listen on vocaroo](https://voca.ro/19daRAoMs0oD) --- ### ๐Ÿ’ธ Number Hallucination (millions format โ€“ still imperfect) > *"Masa cuma buat beli kursi kantor aja harus bayar Rp 2.500.000,-?! Gila sih itu!"* ๐ŸŽง [Listen on vocaroo](https://voca.ro/1nbtoyUOGWJP) > โš ๏ธ Reading large numbers (like millions) is still inaccurate due to limited examples in the training dataset. ๐Ÿค License & Usage This model is released for non-commercial use only. Feel free to explore, fine-tune, or give feedback!