--- datasets: - slprl/TinyStress-15K license: cc-by-nc-4.0 library_name: transformers pipeline_tag: automatic-speech-recognition --- # WhiStress Model This is the official model checkpoint for [***WhiStress***](https://arxiv.org/abs/2505.19103) — introduced in our paper: **WhiStress: Enriching Transcriptions with Sentence Stress Detection** (Interspeech 2025). - 🔗 Project Page: [pages.cs.huji.ac.il/adiyoss-lab/whistress](https://pages.cs.huji.ac.il/adiyoss-lab/whistress) - 📚 Code: [github.com/slp-rl/WhiStress](https://github.com/slp-rl/WhiStress) - 📦 Dataset: [slprl/TinyStress-15K](https://huggingface.co/datasets/slprl/TinyStress-15K) --- ## Overview **WhiStress** extends OpenAI's [Whisper](https://huggingface.co/openai/whisper-small.en) ASR model with a decoder-based classifier that predicts **token-level sentence stress**. This allows models not only to transcribe speech but also to detect which words are emphasized. This checkpoint is based on the `whisper-small.en` variant and adds two stress-specific modules: - `additional_decoder_block.pt` - `classifier.pt` --- ## 🔧 How to Use You can use the weights in your own pipeline by cloning our codebase and loading the components: ```bash git clone https://github.com/slp-rl/WhiStress.git cd WhiStress pip install -r requirements.txt ``` Then, either download the weights manually from this Hugging Face repo or use our script: ```bash python download_weights.py ``` The weights should be placed in the following directory structure: ``` whistress/ ├── weights/ │ ├── additional_decoder_block.pt │ ├── classifier.pt │ └── metadata.json ``` --- ## 🗣️ Inference Example ```python from whistress import WhiStressInferenceClient whistress_client = WhiStressInferenceClient(device="cuda") # or "cpu" pred_transcription, pred_stresses = whistress_client.predict( audio=sample['audio'], # (sr, np.ndarray) transcription=None, # predict directly from audio both transcription and stress, pass transcription to predict stress only. return_pairs=False # set to True if you a list want a list of (word, binary_label) pairs. ) print(pred_transcription) # e.g., "I didn’t say she stole my money." print(pred_stresses) # e.g., ['my'] ``` Each prediction includes: - `transcription`: full text output - `emphasis_indices`: list of stressed token indices - `emphasized_tokens`: list of corresponding words --- ## Notes The model is intended for research purposes only. ## 📜 Citation If you use our model, please cite our work: ```bibtex @misc{yosha2025whistress, title={WHISTRESS: Enriching Transcriptions with Sentence Stress Detection}, author={Iddo Yosha and Dorin Shteyman and Yossi Adi}, year={2025}, eprint={2505.19103}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2505.19103}, } ```