ibm-granite
/

granite-speech-3.2-8b

@@ -9,7 +9,9 @@ library_name: transformers
 # Granite-speech-3.2-8b
 **Model Summary:**
-Granite-speech-3.2-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). The model was trained on a collection of public corpora comprising diverse datasets for ASR and AST as well as synthetic datasets tailored to support the speech translation task. Granite-speech-3.2 was trained by modality aligning granite-3.2-8b-instruct (https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) to speech on publicly available open source corpora containing audio inputs and text targets.
 **Evaluations:**
@@ -167,14 +169,9 @@ and efficient infrastructure for training our models over thousands of GPUs. The
 H100 GPUs.
 **Ethical Considerations and Limitations:**
-Ethical Considerations and Limitations: The use of Large Speech and Language Models may involve risks and ethical considerations that people should
-be aware of. These risks may include bias and fairness, misinformation, and autonomous decision-making. We urge the community to use granite-speech
-3.2-8b in a manner consistent with IBM’s Responsible Use Guide or similar responsible use structures. IBM recommends using this model for automatic
-speech recognition tasks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs. To enhance safety, we
-recommend using granite-speech-3.2-8b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks
-in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Its training, which includes both human-annotated and synthetic
-data informed by internal red-teaming, enables it to outperform similar open-source models on standard benchmarks, providing an additional layer of
-safety.
 **Resources**
 - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite

 # Granite-speech-3.2-8b
 **Model Summary:**
+Granite-speech-3.2-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). Granite-speech-3.2-8b uses a two-pass design, unlike integrated models that combine speech and language into a single pass. Initial calls to granite-speech-3.2-8b will transcribe audio files into text. To process the transcribed text using the underlying Granite language model, users must make a second call as each step must be explicitly initiated.
+The model was trained on a collection of public corpora comprising diverse datasets for ASR and AST as well as synthetic datasets tailored to support the speech translation task. Granite-speech-3.2 was trained by modality aligning granite-3.2-8b-instruct (https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) to speech on publicly available open source corpora containing audio inputs and text targets.
 **Evaluations:**
 H100 GPUs.
 **Ethical Considerations and Limitations:**
+The use of Large Speech and Language Models may involve risks and ethical considerations that people should be aware of. These risks may include bias and fairness, misinformation, and autonomous decision-making. We urge the community to use granite-speech-3.2-8b in a manner consistent with IBM's Responsible Use Guide or similar responsible use structures. IBM recommends using this model for automatic speech recognition tasks. The model's modular design improves safety by limiting how audio inputs can influence the system. If an unfamiliar or malformed prompt is received, the model simply echoes it with its transcription. This minimizes the risk of adversarial inputs, unlike integrated models that directly interpret audio and may be more exposed to such attacks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs.
+To enhance safety, we recommend using granite-speech-3.2-8b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Its training, which includes both human-annotated and synthetic data informed by internal red-teaming, enables it to outperform similar open-source models on standard benchmarks, providing an additional layer of safety.
 **Resources**
 - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite