Meta ASR English
This model is a fine-tuned version of ASR-CTC model enhanced with entity tagging, speaker attributes, and multi-language support for European languages.
Model Details
- Fine-tuned on: Mix of CommonVoice (6 European languages), People's Speech, Indian accented English, and LibriSpeech
- Languages: English, Spanish, French, Italian, German, Portuguese
- Additional Features: Entity tagging, speaker attributes (age, gender, emotion), and intent detection
Output Format
The model provides rich transcriptions including:
- Entity tags (PERSON_NAME, ORGANIZATION, etc.)
- Speaker attributes (AGE, GENDER, EMOTION)
- Intent classification
- Language-specific transcription
Example output:
ENTITY_PERSON_NAME Robert Hoke END was educated at the ENTITY_ORGANIZATION Pleasant Retreat Academy END. AGE_45_60 GER_MALE EMOTION_NEUTRAL INTENT_INFORM
Usage
import nemo.collections.asr as nemo_asr
# Load model
asr_model = nemo_asr.models.EncDecCTCModel.from_pretrained('WhissleAI/meta_stt_euro_v1')
# Transcribe audio
transcription = asr_model.transcribe(['path/to/audio.wav'])
print(transcription[0])
Training Data
The model was fine-tuned on:
- CommonVoice dataset (6 European languages)
- People's Speech English corpus
- Indian accented English
- LibriSpeech corpus (en, es, fr, it, pt)
Model Architecture
Based on FastConformer [1] architecture with 8x depthwise-separable convolutional downsampling, trained using CTC loss.
License
This model is licensed under the CC-BY-4.0 license.
References
[1] Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition [2] NVIDIA NeMo Toolkit
- Downloads last month
- 6
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support