Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.
ElevenLabs' realtime STT offering — poor quality and slow for Arabic.
High-quality Arabic STT with 44% lower WER than Google Chirp 3.
Described as 'shit quality' in production testing. Not viable for Arabic.
Great quality transcription confirmed by user feedback. No repetitions needed. 44% more accurate than Google Chirp 3.
| Feature | ElevenLabs Scribe v2 | Soniox STT RT v3 |
|---|---|---|
| Real-time streaming transcription | ✓ | ✓ |
| Multiple language support | ✓ | ✗ |
| LiveKit inference integration | ✓ | ✗ |
| Language hints | ✗ | ✓ |
| Low word error rate | ✗ | ✓ |
| End-of-utterance detection | ✗ | ✓ |
| Capability | ElevenLabs Scribe v2 | Soniox STT RT v3 |
|---|---|---|
| Streaming support | ✓ | ✓ |
| LiveKit plugin | ✓ | ✗ |
| Self-hostable | ✗ | ✗ |
| API style | WebSocket streaming | WebSocket streaming |
| SDKs | Python, Node.js | Python, Node.js |
Poor quality and poor latency for Arabic. Not recommended for any Arabic STT use case.
Previously the best option for Arabic STT. Excellent quality with 16.2% WER, but superseded by Deepgram Nova-3 which is 75% faster with comparable quality.
Soniox STT RT v3 is faster with an average end-of-utterance delay of 1678ms, which is 322ms faster than ElevenLabs Scribe v2.
Soniox STT RT v3 has a quality rating of 5/5 (Excellent). Great quality transcription confirmed by user feedback. No repetitions needed. 44% more accurate than Google Chirp 3.
Both providers are viable options. ElevenLabs Scribe v2: Poor quality and poor latency for Arabic. Not recommended for any Arabic STT use case. Soniox STT RT v3: Previously the best option for Arabic STT. Excellent quality with 16.2% WER, but superseded by Deepgram Nova-3 which is 75% faster with comparable quality.
ElevenLabs Scribe v2 starts at $5 per month (Includes STT credits). Soniox STT RT v3 starts at $0.005 per minute (Real-time streaming).