--- license: unknown language: - en metrics: - wer tags: - whisper - speech processing - nlp - asr - domain adaptation --- # Whispered TIA Whispered TIA is a fine-tuned ASR model based on Whisper. It is adapted to the software TIA (Totally Integrated Automation) from Siemens AG and is able to predict domain specific words and to transcribe them correctly. # Base Model Whisper Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Whisper was proposed in the paper [Robust Speech Recognition via Large-Scale Weak Supervision](https://arxiv.org/abs/2212.04356) by Alec Radford et al. from OpenAI. The original code repository can be found [here](https://github.com/openai/whisper). # Training Results The False HallucER indicates how many hallucinations and deletions were produced. ## Nosil Dataset

WER	False HallucER	Runtime	Batch Size	Memory Usage
1.76	1034.76	1.78	64	20049
~	Predictions > References: 34%	~	~	~
~	Predictions < References: 34%	~	~	~
~	Predictions = References: 32%	~	~	~

# Dataset For more information on the underlying dataset, see dataset: nosil. # Inference ```python import librosa import torch from transformers import WhisperProcessor, WhisperForConditionalGeneration # Insert audio file file = "/path/to/audio" # Convert to Mel Spectrogram arr, sampling_rate = librosa.load(file, sr=16000) # Load whisper model and processor processor = WhisperProcessor.from_pretrained("openai/whisper-small") model = WhisperForConditionalGeneration.from_pretrained("masters-thesis-vm/whispered_TIA_small_ad_tokenization_encoder_freezing_nosil") # Preprocessing input_features = processor(arr, return_tensors="pt", sampling_rate=sampling_rate).input_features # Prediction forced_decoder_ids = processor.get_decoder_prompt_ids(language="en", task="transcribe") predicted_ids = model.generate(input_features, forced_decoder_ids=forced_decoder_ids) transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True) print(transcription) ```