Production TestedVoice Activity Detection

Silero VAD

Open-source voice activity detection used in production voice agents.

Recommended

The standard choice for VAD in voice agent pipelines. Free, lightweight, and works well with Arabic speech.

Silero VAD is a lightweight, high-accuracy voice activity detection model. It's the de facto standard for VAD in voice agent pipelines, used to detect when a user starts and stops speaking before passing audio to STT.

Benchmarks

Latency

Avg EOU DelayN/A

Quality

RatingGood

Arabic Dialect Support

Language-agnostic

Works well for Arabic speech detection. Configurable sensitivity thresholds allow tuning for different environments.

Features

Real-time voice activity detection

Configurable sensitivity thresholds

End-of-utterance detection

Lightweight model (<2MB)

ONNX runtime support

Language-agnostic

StreamingLiveKit PluginSelf-Hostable

PricingFree Tier Available

Plan	Price	Unit	Details
Open Source	$0	free	MIT license

Integration

SDKs

PythonNode.jsC++

API Style

Library / ONNX model

Documentation

Verdict

The standard choice for VAD in voice agent pipelines. Free, lightweight, and works well with Arabic speech.

Best For

Voice agent VADReal-time speech detectionSelf-hosted pipelines

Pros

Free and open-source
Extremely lightweight
Language-agnostic (works for Arabic)
Highly configurable
LiveKit integration

Cons

VAD tuning has diminishing returns
STT bottleneck matters more than VAD speed
Requires tuning for optimal performance

Visit Silero VAD

Go to https://github.com/snakers4/silero-vad