Production TestedVoice Activity Detection

Silero VAD

Open-source voice activity detection used in production voice agents.

Recommended

The standard choice for VAD in voice agent pipelines. Free, lightweight, and works well with Arabic speech.

Silero VAD is a lightweight, high-accuracy voice activity detection model. It's the de facto standard for VAD in voice agent pipelines, used to detect when a user starts and stops speaking before passing audio to STT.

Benchmarks

Latency

Avg EOU DelayN/A

Quality

RatingGood
Arabic Dialect Support
Language-agnostic

Works well for Arabic speech detection. Configurable sensitivity thresholds allow tuning for different environments.

Features

Real-time voice activity detection
Configurable sensitivity thresholds
End-of-utterance detection
Lightweight model (<2MB)
ONNX runtime support
Language-agnostic
StreamingLiveKit PluginSelf-Hostable

PricingFree Tier Available

PlanPriceUnit
Open Source$0free

Integration

SDKs
PythonNode.jsC++
API Style

Library / ONNX model

Documentation

Verdict

The standard choice for VAD in voice agent pipelines. Free, lightweight, and works well with Arabic speech.

Best For
Voice agent VADReal-time speech detectionSelf-hosted pipelines

Pros

  • Free and open-source
  • Extremely lightweight
  • Language-agnostic (works for Arabic)
  • Highly configurable
  • LiveKit integration

Cons

  • VAD tuning has diminishing returns
  • STT bottleneck matters more than VAD speed
  • Requires tuning for optimal performance
Visit Silero VAD

Go to https://github.com/snakers4/silero-vad