Model Card for Model ID

image

Fine tunned ASR model from distil-whisper/distil-large-v2.

This model aimed to transcribe japanese audio especially visual novel.

WaifuModel Collections

Unified Demo

WaifuAssitant

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: spow12(yw_nam)
  • Shared by : spow12(yw_nam)
  • Model type: Seq2Seq
  • Language(s) (NLP): japanese
  • Finetuned from model : distil-whisper/distil-large-v2.

Uses

from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import librosa

processor = AutoProcessor.from_pretrained('spow12/Visual-novel-transcriptor', language="ja", task="transcribe")
model = AutoModelForSpeechSeq2Seq.from_pretrained('spow12/Visual-novel-transcriptor').cuda()
model.config.forced_decoder_ids = processor.get_decoder_prompt_ids(language="ja", task="transcribe")

data, _ = librosa.load(wav_path, sr=16000)
input_features = processor(data, sampling_rate=16000, return_tensors="pt").input_features.cuda()
predicted_ids = model.generate(input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Bias, Risks, and Limitations

This model trained by japanese dataset included visual novel which contain nsfw content.

Use & Credit

This model is currently available for non-commercial use only. Also, since I'm not detailed in licensing, I hope you use it responsibly.

By sharing this model, I hope to contribute to the research efforts of our community (the open-source community and anime persons).

Citation

@misc {Visual-novel-transcriptor,
    author       = { YoungWoo Nam },
    title        = { Visual-novel-transcriptor },
    year         = 2024,
    url          = { https://huggingface.co/spow12/Visual-novel-transcriptor },
    publisher    = { Hugging Face }
}
Downloads last month
7
Safetensors
Model size
756M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train spow12/Visual-novel-transcriptor

Collection including spow12/Visual-novel-transcriptor