Cuiunbo commited on
Commit
9baca8a
·
1 Parent(s): 9808eca

update readme

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -496,7 +496,7 @@ Note: For proprietary models, we calculate token density based on the image enco
496
  <th>Size</th>
497
  <th colspan="3">ASR (zh)</th>
498
  <th colspan="3">ASR (en)</th>
499
- <th colspan="2">ASR</th>
500
  <th>Emotion</th>
501
  </tr>
502
  <tr>
@@ -1101,7 +1101,7 @@ else:
1101
 
1102
  ### Audio-Only mode
1103
  #### Mimick
1104
- - In this task, you can see the models end-to-end ability. MiniCPM-o 2.6 takes an audio input and produces both an automatic speech recognition (ASR) transcription and a voice imitation (TTS) output.
1105
  ```python
1106
  mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
1107
  audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)
 
496
  <th>Size</th>
497
  <th colspan="3">ASR (zh)</th>
498
  <th colspan="3">ASR (en)</th>
499
+ <th colspan="2">AST</th>
500
  <th>Emotion</th>
501
  </tr>
502
  <tr>
 
1101
 
1102
  ### Audio-Only mode
1103
  #### Mimick
1104
+ `Mimick` task reflects a model's end-to-end speech modeling capability. The model takes audio input, and outputs an ASR transcription and subsequently reconstructs the original audio with high similarity. The higher the similarity between the reconstructed audio and the original audio, the stronger the model's foundational capability in end-to-end speech modeling.
1105
  ```python
1106
  mimick_prompt = "Please repeat each user's speech, including voice style and speech content."
1107
  audio_input, _ = librosa.load('xxx.wav', sr=16000, mono=True)