bweng commited on
Commit
0915f2f
·
verified ·
1 Parent(s): 3f8960e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -2
README.md CHANGED
@@ -52,7 +52,7 @@ widget:
52
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
53
  ---
54
 
55
- # **<span style="color:#76b900;">🦜 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model</span>**
56
 
57
  <style>
58
  img {
@@ -63,6 +63,80 @@ img {
63
  [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--TDT-blue#model-badge)](#model-architecture)
64
  | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
65
  | [![Language](https://img.shields.io/badge/Language-EU_Languages-blue#model-badge)](#datasets)
 
 
66
 
 
67
 
68
- More details coming soon
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
  src: https://cdn-media.huggingface.co/speech_samples/sample2.flac
53
  ---
54
 
55
+ # **<span style="color:#76b900> 🧃 parakeet-tdt-0.6b-v3: Multilingual Speech-to-Text Model CoreML </span>**
56
 
57
  <style>
58
  img {
 
63
  [![Model architecture](https://img.shields.io/badge/Model_Arch-FastConformer--TDT-blue#model-badge)](#model-architecture)
64
  | [![Model size](https://img.shields.io/badge/Params-0.6B-green#model-badge)](#model-architecture)
65
  | [![Language](https://img.shields.io/badge/Language-EU_Languages-blue#model-badge)](#datasets)
66
+ | [![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
67
+ | [![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/FluidAudio?style=flat&logo=github)](https://github.com/FluidInference/FluidAudio)
68
 
69
+ On‑device multilingual ASR model converted to Core ML for Apple platforms. This model powers FluidAudio’s batch ASR and is the same model used in our backend. It supports 25 European languages and is optimized for low‑latency, private, offline transcription.
70
 
71
+
72
+ ## Highlights
73
+
74
+ - **Core ML**: Runs fully on‑device (ANE/CPU) on Apple Silicon.
75
+ - **Multilingual**: 25 European languages; see model usage in FluidAudio for examples.
76
+ - **Performance**: ~110× RTF on M4 Pro for batch ASR (1 min audio ≈ 0.5 s).
77
+ - **Privacy**: No network calls required once models are downloaded.
78
+
79
+ ## Intended Use
80
+
81
+ - **Batch transcription** of complete audio files on macOS/iOS.
82
+ - **Local dictation** and note‑taking apps where privacy and latency matter.
83
+ - **Embedded ASR** in production apps via the FluidAudio Swift framework.
84
+
85
+ ## Supported Platforms
86
+
87
+ - macOS 14+ (Apple Silicon recommended)
88
+ - iOS 17+
89
+
90
+ ## Model Details
91
+
92
+ - **Architecture**: Parakeet TDT v3 (Token Duration Transducer, 0.6B parameters)
93
+ - **Input audio**: 16 kHz, mono, Float32 PCM in range [-1, 1]
94
+ - **Languages**: 25 European languages (multilingual)
95
+ - **Precision**: Mixed precision optimized for Core ML execution (ANE/CPU)
96
+
97
+ ## Performance
98
+
99
+ - **Real‑time factor (RTF)**: ~110× on M4 Pro in batch mode
100
+ - Throughput and latency vary with device, input duration, and compute units (ANE/CPU).
101
+
102
+ ## Usage
103
+
104
+ For quickest integration, use the FluidAudio Swift framework which handles model loading, audio preprocessing, and decoding.
105
+
106
+ ### Swift (FluidAudio)
107
+
108
+ ```swift
109
+ import AVFoundation
110
+ import FluidAudio
111
+
112
+ Task {
113
+ // Download and load ASR models (first run only)
114
+ let models = try await AsrModels.downloadAndLoad()
115
+
116
+ // Initialize ASR manager with default config
117
+ let asr = AsrManager(config: .default)
118
+ try await asr.initialize(models: models)
119
+
120
+ // Load audio and transcribe
121
+ let samples = try await AudioProcessor.loadAudioFile(path: "path/to/audio.wav")
122
+ let result = try await asr.transcribe(samples, source: .system)
123
+ print(result.text)
124
+
125
+ asr.cleanup()
126
+ }
127
+ ```
128
+
129
+ For more examples (including CLI usage and benchmarking), see the FluidAudio repository: https://github.com/FluidInference/FluidAudio
130
+
131
+ ## Files
132
+
133
+ - Core ML model artifacts suitable for use via the FluidAudio APIs (preferred) or directly with Core ML.
134
+ - Tokenizer and configuration assets are included/managed by FluidAudio’s loaders.
135
+
136
+ ## Limitations
137
+
138
+ - Primary coverage is European languages; performance may degrade for non‑European languages.
139
+
140
+ ## License
141
+
142
+ Apache 2.0. See the FluidAudio repository for details and usage guidance.