metadata

language: vie
datasets:
  - legacy-datasets/common_voice
  - vlsp2020_vinai_100h
  - AILAB-VNUHCM/vivos
  - doof-ferb/vlsp2020_vinai_100h
  - doof-ferb/fpt_fosd
  - doof-ferb/infore1_25hours
  - linhtran92/viet_bud500
  - doof-ferb/LSVSC
  - doof-ferb/vais1000
  - doof-ferb/VietMed_labeled
  - NhutP/VSV-1100
  - doof-ferb/Speech-MASSIVE_vie
  - doof-ferb/BibleMMS_vie
  - capleaf/viVoice
metrics:
  - wer
pipeline_tag: automatic-speech-recognition
tags:
  - transcription
  - audio
  - speech
  - chunkformer
  - asr
  - automatic-speech-recognition
license: cc-by-nc-4.0
model-index:
  - name: ChunkFormer Large Vietnamese
    results:
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common-voice-vietnamese
          type: common_voice
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 6.66
        source:
          name: Common Voice Vi Leaderboard
          url: >-
            https://paperswithcode.com/sota/speech-recognition-on-common-voice-vi
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: VIVOS
          type: vivos
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 4.18
        source:
          name: Vivos Leaderboard
          url: https://paperswithcode.com/sota/speech-recognition-on-vivos
      - task:
          name: Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: VLSP - Task 1
          type: vlsp
          args: vi
        metrics:
          - name: Test WER
            type: wer
            value: 14.09

ChunkFormer-Large-Vie: Large-Scale Pretrained ChunkFormer for Vietnamese Automatic Speech Recognition

Citation

If you use this work in your research, please cite:

@INPROCEEDINGS{10888640,
  author={Le, Khanh and Ho, Tuan Vu and Tran, Dung and Chau, Duc Thanh},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={ChunkFormer: Masked Chunking Conformer For Long-Form Speech Transcription}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Scalability;Memory management;Graphics processing units;Signal processing;Performance gain;Hardware;Resource management;Speech processing;Standards;Context modeling;chunkformer;masked batch;long-form transcription},
  doi={10.1109/ICASSP49660.2025.10888640}}
}

Contact

[email protected]