Turnsense: Turn-Detector Model

A lightweight end-of-utterance (EOU) detection model fine-tuned on SmolLM2-135M, optimized for Raspberry Pi and low-power devices. The model is trained on TURNS-2K, a diverse dataset designed to capture various Speech-to-Text (STT) output patterns, including backchannels, mispronunciations, code-switching, and different text formatting styles. This makes the model robust across different STT systems and their output variations.

πŸ”‘ Key Features

  • Lightweight Architecture: Built on SmolLM2-135M (~135M parameters)
  • High Performance: 97.50% accuracy (standard) / 93.75% accuracy (quantized)
  • Resource Efficient: Optimized for edge devices and low-power hardware
  • ONNX Support: Compatible with ONNX Runtime and Hugging Face Transformers

πŸ“Š Performance Metrics

The model demonstrates robust performance across different configurations:

  • Standard Model: 97.50% accuracy
  • Quantized Model: 93.75% accuracy
  • Average Probability Difference: 0.0323 between versions

image/png

Speed Performance

image/png

πŸ”Ή Installation

pip install transformers onnxruntime numpy huggingface_hub

πŸš€ Quick Start

import onnxruntime as ort
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download

# Download and load tokenizer and model
model_id = "latishab/turnsense"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model_path = hf_hub_download(repo_id=model_id, filename="model_quantized.onnx")

# Initialize ONNX Runtime session
session = ort.InferenceSession(model_path, providers=["CPUExecutionProvider"])

# Prepare input
text = "Hello, how are you?"
inputs = tokenizer(
    f"<|user|> {text} <|im_end|>",
    padding="max_length",
    max_length=256,
    return_tensors="pt"
)

# Run inference
ort_inputs = {
    'input_ids': inputs['input_ids'].numpy(),
    'attention_mask': inputs['attention_mask'].numpy()
}
probabilities = session.run(None, ort_inputs)[0]

πŸ“š Dataset: TURNS-2K

The model is trained on TURNS-2K, a comprehensive dataset specifically designed for end-of-utterance detection. It captures diverse speech patterns including:

  • Backchannels and self-corrections
  • Code-switching and language mixing
  • Multiple text formatting styles
  • Speech-to-Text (STT) output variations

This diverse training data ensures robustness across different:

  • Speech patterns and dialects
  • STT systems and their output formats
  • Use cases and deployment scenarios

πŸ’­ Motivation & Current State

The inspiration for Turnsense came from a notable gap in the open-source AI landscape - the scarcity of efficient, lightweight turn detection models. While building a local conversational AI agent, I found that most available solutions were either proprietary or too resource-intensive for edge devices. This led to the development of Turnsense, a practical solution designed specifically for real-world deployment on hardware like Raspberry Pi.

Currently, the model is trained primarily on English speech patterns using a modest dataset of 2,000 samples through LoRA fine-tuning on SmolLM2-135M. While it handles common speech-to-text outputs effectively, there are certainly edge cases and complex conversational patterns yet to be addressed. The choice of ONNX format was deliberate, prioritizing compatibility with low-power devices, though we're exploring potential ports to platforms like Apple MLX.

The project's success relies heavily on community involvement. Whether it's expanding the dataset, adding multilingual support, or improving pattern recognition for complex conversational scenarios, contributions of all kinds can help evolve Turnsense into a more robust and versatile tool.

πŸ“„ License

This project is licensed under the Apache 2.0 License.

🀝 Contributing

Contributions are welcome! Areas where you can help:

  • Dataset expansion
  • Model optimization
  • Documentation improvements
  • Bug reports and fixes

Please feel free to submit a Pull Request or open an Issue.

πŸ“š Citation

If you use this model in your research, please cite it using:

@software{latishab2025turnsense,
  author       = {Latisha Besariani HENDRA},
  title        = {Turnsense: A Lightweight End-of-Utterance Detection Model},
  month        = mar,
  year         = 2025,
  publisher    = {GitHub},
  journal      = {GitHub repository},
  url          = {https://github.com/latishab/turnsense},
  note         = {https://huggingface.co/latishab/turnsense}
}
Downloads last month
12
Safetensors
Model size
135M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 1 Ask for provider support

Model tree for latishab/turnsense

Quantized
(76)
this model

Dataset used to train latishab/turnsense