MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
Paper
•
2411.18152
•
Published
Multi-Lingual Speech Recognition
return_timestamps=True
helps reduce hallucinations, particularly when doing long-form evaluation with Transformers’ “chunked” algorithm. The cat sat on the on the on the mat.
<|0.00|> The cat sat on the on the on the mat.<|5.02|>
<|0.00|> The cat sat on the mat.<|5.02|>
TLDR:
BioLORD-2023 is a series of semantic language models for the biomedical domain, capable of representing clinical concepts and sentences in a semantic space aligned with human preferences. Our new multilingual version supports 50+ languages and is further finetuned on 7 European languages. These models were trained contrastively and through distillations, using a corpus unifying in the same latent space the concept names of biomedical concepts and their descriptions. For concepts which didn't have a description written by humans in UMLS, we use information contained in the SnomedCT knowledge graph and the capabilities of ChatGPT to generate synthetic data and improve our results.