whisper-tiny-ja-lora

A LoRA-finetuned version of openai/whisper-tiny for Japanese Automatic Speech Recognition (ASR), trained on the ReazonSpeech dataset using Parameter-Efficient Fine-Tuning (PEFT/LoRA).

Model Details

Model Description

This model applies Low-Rank Adaptation (LoRA) on top of Whisper Tiny to improve Japanese transcription quality while keeping the number of trainable parameters small. LoRA adapters are merged post-training for easy deployment.

  • Model type: Automatic Speech Recognition (ASR)
  • Language: Japanese (ja)
  • Base model: openai/whisper-tiny
  • Fine-tuning method: LoRA (Low-Rank Adaptation) via PEFT
  • License: Apache 2.0
  • Developed by: dungca

Model Sources

Uses

Direct Use

This model is designed for Japanese speech-to-text transcription tasks:

  • Transcribing Japanese audio files
  • Japanese voice assistants and conversational AI
  • Japanese language learning applications (e.g., pronunciation feedback)
  • Subtitle generation for Japanese audio/video content

Out-of-Scope Use

  • Non-Japanese speech (model is fine-tuned specifically for Japanese)
  • Real-time streaming ASR in latency-critical production systems (whisper-tiny architecture may not meet accuracy requirements)

How to Get Started with the Model

Load LoRA Adapter (PEFT)

import torch
from transformers import AutoProcessor, WhisperForConditionalGeneration
from peft import PeftModel

# Load base model and processor
base_model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny")
processor = AutoProcessor.from_pretrained("openai/whisper-tiny")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "dungca/whisper-tiny-ja-lora")
model.eval()

# Transcribe audio
def transcribe(audio_array, sampling_rate=16000):
    inputs = processor(
        audio_array,
        sampling_rate=sampling_rate,
        return_tensors="pt"
    )
    with torch.no_grad():
        predicted_ids = model.generate(
            inputs["input_features"],
            language="japanese",
            task="transcribe"
        )
    return processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]

Quick Inference with Pipeline

from transformers import pipeline
from peft import PeftModel, PeftConfig
from transformers import WhisperForConditionalGeneration, AutoProcessor

config = PeftConfig.from_pretrained("dungca/whisper-tiny-ja-lora")
base_model = WhisperForConditionalGeneration.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, "dungca/whisper-tiny-ja-lora")

processor = AutoProcessor.from_pretrained(config.base_model_name_or_path)

asr = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    generate_kwargs={"language": "japanese", "task": "transcribe"},
)

result = asr("your_audio.wav")
print(result["text"])

Training Details

Training Data

  • Dataset: ReazonSpeech (small split)
  • Language: Japanese (ja)
  • ReazonSpeech is a large-scale Japanese speech corpus collected from broadcast TV, covering diverse speaking styles and topics.

Training Procedure

LoRA Configuration

Parameter Value
lora_r 16
lora_alpha 32
lora_dropout 0.05
target_modules q_proj, v_proj

Training Hyperparameters

Parameter Value
Learning rate 1e-5
Batch size 32
Epochs ~1.55 (3000 steps)
Training regime fp16 mixed precision
Optimizer AdamW

Infrastructure

Hardware Kaggle GPU — NVIDIA P100 (16GB)
Cloud Provider Kaggle (Google Cloud)
Compute Region US
Framework Transformers + PEFT + Datasets
PEFT version 0.18.1

MLOps Pipeline

Training is fully automated via GitHub Actions:

  • CI: Syntax check + lightweight data validation on every push
  • CT (Continuous Training): Triggers Kaggle kernel for LoRA fine-tuning on data/code changes
  • CD: Quality gate checks CER before promoting model to HuggingFace Hub

Evaluation

Testing Data

Evaluated on the ReazonSpeech validation split.

Metrics

  • CER (Character Error Rate): Lower is better. Standard metric for Japanese ASR (character-level, unlike WER used for English).

Results

Metric Value
eval/cer 0.52497 (~52.5%)
eval/loss 1.17656
eval/runtime 162.422s
eval/samples_per_second 12.314
eval/steps_per_second 0.770
train/global_step 3000
train/epoch 1.547
train/grad_norm 2.161

Note: CER of ~52.5% reflects the constraints of whisper-tiny (39M parameters) on a small training subset. A follow-up experiment with whisper-small and extended training is in progress and expected to significantly reduce CER.

Bias, Risks, and Limitations

  • Model size: Whisper Tiny is optimized for speed and efficiency, not peak accuracy. Expect higher error rates on noisy audio, accented speech, or domain-specific vocabulary.
  • Training data scope: Trained on broadcast Japanese; may perform worse on conversational or dialectal Japanese.
  • CER baseline: The current CER reflects an early training checkpoint. Further training epochs and a larger model size (whisper-small) are expected to improve results.

Recommendations

For production use cases requiring high accuracy, consider using openai/whisper-large-v3 or waiting for the upcoming whisper-small-ja-lora checkpoint.

Citation

If you use this model, please cite the base Whisper model and the LoRA/PEFT method:

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv}
}

@misc{hu2021lora,
  title={LoRA: Low-Rank Adaptation of Large Language Models},
  author={Hu, Edward J. and others},
  year={2021},
  eprint={2106.09685},
  archivePrefix={arXiv}
}

Framework Versions

  • PEFT: 0.18.1
  • Transformers: ≥4.36.0
  • PyTorch: ≥2.0.0
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dungca/whisper-tiny-ja-lora

Adapter
(62)
this model

Space using dungca/whisper-tiny-ja-lora 1

Papers for dungca/whisper-tiny-ja-lora