File size: 2,788 Bytes
b319c9f 741d0b2 b319c9f cd11712 b319c9f 9460198 67f1634 741d0b2 b319c9f 8800da8 0a51fa1 8800da8 b319c9f 0a51fa1 b319c9f 741d0b2 1e66a99 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 741d0b2 b319c9f 56653cd 741d0b2 7e25e6a 741d0b2 56653cd b319c9f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
---
library_name: transformers
datasets:
- reazon-research/reazonspeech
- joujiboi/japanese-anime-speech
language:
- ja
- en
metrics:
- cer
pipeline_tag: automatic-speech-recognition
---
# Model Card for Model ID
![image](./cover_image.jpeg)
<!-- Generated using cagliostrolab/animagine-xl-3.0 -->
<!--Prompt: 1girl, black long hair, suit, headphone, write down, upper body, indoor, night, masterpiece, best quality -->
Fine tunned ASR model from [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
This model aimed to transcribe japanese audio especially visual novel.
# WaifuModel Collections
- [TTS](https://huggingface.co/spow12/visual_novel_tts)
- [Chat](https://huggingface.co/spow12/ChatWaifu_v1.2.1)
- [ASR](https://huggingface.co/spow12/Visual-novel-transcriptor)
# Unified Demo
[WaifuAssitant](https://github.com/yw0nam/WaifuAssistant)
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** spow12(yw_nam)
- **Shared by :** spow12(yw_nam)
- **Model type:** Seq2Seq
- **Language(s) (NLP):** japanese
- **Finetuned from model :** [distil-whisper/distil-large-v2](https://huggingface.co/distil-whisper/distil-large-v2).
## Uses
```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa
processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")
data, _ = librosa.load(wav_path, sr=16000)
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
```
## Bias, Risks, and Limitations
This model trained by japanese dataset included visual novel which contain nsfw content.
## Use & Credit
This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly.
By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).
## Citation
```bibtex
@misc {Visual-novel-transcriptor,
author = { YoungWoo Nam },
title = { Visual-novel-transcriptor },
year = 2024,
url = { https://huggingface.co/spow12/Visual-novel-transcriptor },
publisher = { Hugging Face }
}
```
|