Arabic FastConformer CTC ONNX

This is an ONNX export of NVIDIA's Arabic FastConformer CTC model for automatic speech recognition.

Model Details

Model Type: CTC (Connectionist Temporal Classification)
Language: Arabic (ar)
Sample Rate: 16kHz
Framework: ONNX Runtime
Vocabulary Size: 3 files included

Files

model.onnx
vocab.txt
config.json

Usage

import onnxruntime as ort
import numpy as np
import librosa

# Load the model
session = ort.InferenceSession("model.onnx")

# Load and preprocess audio
audio, sr = librosa.load("audio.wav", sr=16000)
audio_length = np.array([len(audio)], dtype=np.int64)

# Run inference
outputs = session.run(None, {
    "audio_signal": audio.reshape(1, -1).astype(np.float32),
    "length": audio_length
})

# Decode outputs (you'll need to implement CTC decoding)
logits = outputs[0]  # Shape: [batch, time, vocab]

Wyoming Protocol Integration

This model can be used with Wyoming protocol for Home Assistant voice integration:

# Install the Wyoming server (when available)
pip install wyoming-arabic-asr

# Run the server
wyoming-arabic-asr --model-path Mo-alaa/arabic-fastconformer-ctc-onnx --uri tcp://0.0.0.0:10300

Home Assistant Configuration

Add to your Home Assistant configuration.yaml:

wyoming:
  - uri: tcp://your-server:10300
    protocol: wyoming
    name: "Arabic ASR"
    language: "ar"

Original Model

Based on: nvidia/stt_ar_fastconformer_hybrid_large_pcd_v1.0

License

This model is released under CC-BY-4.0 license.

Mo-alaa
/

arabic-fastconformer-ctc-onnx