omnineural

OmniNeural — World’s First NPU-aware Multimodal Model

Overview

OmniNeural is the first fully multimodal model designed specifically for Neural Processing Units (NPUs). It natively understands text, images, and audio, and runs across PCs, mobile devices, automobile, IoT, and robotics.

Demos

📱 Mobile Phone NPU - Demo on Samsung S25 Ultra

The first-ever fully local, multimodal, and conversational AI assistant that hears you and sees what you see, running natively on Snapdragon NPU for long battery life and low latency.

✨ PC NPU - Capabilities Highlights

🖼️ Multi-Image Reasoning
Spot the difference across two images in multi-round dialogue.

🤖 Image + Text → Function Call
Snap a poster, add a text instruction, and AI agent creates a calendar event.

🎶 Multi-Audio Comparison
Tell the difference between two music clips locally.

Key Features

Multimodal Intelligence – Processes text, image, and audio in a unified model for richer reasoning and perception.
NPU-Optimized Architecture – Uses ReLU ops, sparse tensors, convolutional layers, and static graph execution for maximum throughput — 20% faster than non-NPU-aware models .
Hardware-Aware Attention – Attention patterns tuned for NPU, lowering compute and memory demand .
Native Static Graph – Supports variable-length multimodal inputs with stable, predictable latency .
Performance Gains – 9× faster audio processing and 3.5× faster image processing on NPUs compared to baseline encoders .
Privacy-First Inference – All computation stays local: private, offline-capable, and cost-efficient.

Performance / Benchmarks

Human Evaluation (vs baselines)

Vision: Wins/ties in ~75% of prompts against Apple Foundation, Gemma-3n-E4B, Qwen2.5-Omni-3B.
Audio: Clear lead over baselines, much better than Gemma3n and Apple foundation model.
Text: Matches or outperforms leading multimodal baselines.

Human eval chart

Nexa Attention Speedups

9× faster audio encoding (vs Whisper encoder).
3.5× faster image encoding (vs SigLIP encoder).

Human eval chart

Architecture Overview

OmniNeural’s design is tightly coupled with NPU hardware:

NPU-friendly ops (ReLU > GELU/SILU).
Sparse + small tensor multiplications for efficiency.
Convolutional layers favored over linear for better NPU parallelization.
Hardware-aware attention patterns to cut compute cost.
Static graph execution for predictable latency.

Production Use Cases

PC & Mobile – On-device AI agents combine voice, vision, and text for natural, accurate responses.
- Examples: Summarize slides into an email (PC)*, *extract action items from chat (mobile).
- Benefits: Private, offline, battery-efficient.
Automotive – In-car assistants handle voice control, cabin safety, and environment awareness.
- Examples: Detects risks (child unbuckled, pet left, loose objects) and road conditions (fog, construction).
- Benefits: Decisions run locally in milliseconds.
IoT & Robotics – Multimodal sensing for factories, AR/VR, drones, and robots.
- Examples: Defect detection, technician overlays, hazard spotting mid-flight, natural robot interaction.
- Benefits: Works without network connectivity.

How to use

⚠️ Hardware requirement: OmniNeural-4B currently runs only on Qualcomm NPUs (e.g., Snapdragon-powered AIPC).
Apple NPU support is planned next.

1) Install Nexa-SDK

Download and follow the steps under "Deploy Section" Nexa's model page: Download Windows arm64 SDK
(Other platforms coming soon)

2) Get an access token

Create a token in the Model Hub, then log in:

nexa config set license '<access_token>'

3) Run the model

Running:

nexa infer NexaAI/OmniNeural-4B

/mic mode. Once the model is running, you can type below to record your voice directly in terminal

> /mic

For images and audio, simply drag your files into the command line. Remember to leave space between file paths.

Links & Community

Issues / Feedback: Use the HF Discussions tab or submit an issue in our discord or nexa-sdk github.
Roadmap & updates: Follow us on X and Discord.

If you want to see more NPU-first, multimodal releases on HF, please give our model a like ❤️.

Citation

@misc{
      title={OmniNeural: World’s First NPU-aware Multimodal Model}, 
      author={Nexa AI},
      year={2025},
      url={https://huggingface.co/NexaAI/OmniNeural-4B}, 
}

NexaAI
/

OmniNeural-4B