gsaon commited on
Commit
a9eb669
·
verified ·
1 Parent(s): 6d3cf6f

Update README.md

Browse files

Added two-pass design and updated ethical considerations

Files changed (1) hide show
  1. README.md +6 -9
README.md CHANGED
@@ -9,7 +9,9 @@ library_name: transformers
9
  # Granite-speech-3.2-8b
10
 
11
  **Model Summary:**
12
- Granite-speech-3.2-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). The model was trained on a collection of public corpora comprising diverse datasets for ASR and AST as well as synthetic datasets tailored to support the speech translation task. Granite-speech-3.2 was trained by modality aligning granite-3.2-8b-instruct (https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) to speech on publicly available open source corpora containing audio inputs and text targets.
 
 
13
 
14
  **Evaluations:**
15
 
@@ -167,14 +169,9 @@ and efficient infrastructure for training our models over thousands of GPUs. The
167
  H100 GPUs.
168
 
169
  **Ethical Considerations and Limitations:**
170
- Ethical Considerations and Limitations: The use of Large Speech and Language Models may involve risks and ethical considerations that people should
171
- be aware of. These risks may include bias and fairness, misinformation, and autonomous decision-making. We urge the community to use granite-speech
172
- 3.2-8b in a manner consistent with IBM’s Responsible Use Guide or similar responsible use structures. IBM recommends using this model for automatic
173
- speech recognition tasks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs. To enhance safety, we
174
- recommend using granite-speech-3.2-8b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks
175
- in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Its training, which includes both human-annotated and synthetic
176
- data informed by internal red-teaming, enables it to outperform similar open-source models on standard benchmarks, providing an additional layer of
177
- safety.
178
 
179
  **Resources**
180
  - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite
 
9
  # Granite-speech-3.2-8b
10
 
11
  **Model Summary:**
12
+ Granite-speech-3.2-8b is a compact and efficient speech-language model, specifically designed for automatic speech recognition (ASR) and automatic speech translation (AST). Granite-speech-3.2-8b uses a two-pass design, unlike integrated models that combine speech and language into a single pass. Initial calls to granite-speech-3.2-8b will transcribe audio files into text. To process the transcribed text using the underlying Granite language model, users must make a second call as each step must be explicitly initiated.
13
+
14
+ The model was trained on a collection of public corpora comprising diverse datasets for ASR and AST as well as synthetic datasets tailored to support the speech translation task. Granite-speech-3.2 was trained by modality aligning granite-3.2-8b-instruct (https://huggingface.co/ibm-granite/granite-3.2-8b-instruct) to speech on publicly available open source corpora containing audio inputs and text targets.
15
 
16
  **Evaluations:**
17
 
 
169
  H100 GPUs.
170
 
171
  **Ethical Considerations and Limitations:**
172
+ The use of Large Speech and Language Models may involve risks and ethical considerations that people should be aware of. These risks may include bias and fairness, misinformation, and autonomous decision-making. We urge the community to use granite-speech-3.2-8b in a manner consistent with IBM's Responsible Use Guide or similar responsible use structures. IBM recommends using this model for automatic speech recognition tasks. The model's modular design improves safety by limiting how audio inputs can influence the system. If an unfamiliar or malformed prompt is received, the model simply echoes it with its transcription. This minimizes the risk of adversarial inputs, unlike integrated models that directly interpret audio and may be more exposed to such attacks. Note that more general speech tasks may pose higher inherent risks of triggering unwanted outputs.
173
+
174
+ To enhance safety, we recommend using granite-speech-3.2-8b alongside Granite Guardian. Granite Guardian is a fine-tuned instruct model designed to detect and flag risks in prompts and responses across key dimensions outlined in the IBM AI Risk Atlas. Its training, which includes both human-annotated and synthetic data informed by internal red-teaming, enables it to outperform similar open-source models on standard benchmarks, providing an additional layer of safety.
 
 
 
 
 
175
 
176
  **Resources**
177
  - ⭐️ Learn about the latest updates with Granite: https://www.ibm.com/granite