Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.
Ultra-fast Arabic STT with poor transcription quality.
Mistral's speech model — completely non-functional for Arabic.
Users had to repeat themselves frequently. Quality unacceptable for production use.
Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.
| Feature | Speechmatics | Mistral Voxtral Mini |
|---|---|---|
| Real-time streaming transcription | ✓ | ✗ |
| Configurable endpointing | ✓ | ✗ |
| Standard and enhanced operating points | ✓ | ✗ |
| Custom dictionary | ✓ | ✗ |
| Multilingual speech recognition (claimed) | ✗ | ✓ |
| Audio understanding | ✗ | ✓ |
| Capability | Speechmatics | Mistral Voxtral Mini |
|---|---|---|
| Streaming support | ✓ | ✗ |
| LiveKit plugin | ✗ | ✗ |
| Self-hostable | ✓ | ✗ |
| API style | WebSocket streaming + REST | REST |
| SDKs | Python, Node.js | Python, Node.js |
Amazingly fast but Arabic quality is too poor for production. The speed advantage is meaningless when users have to repeat themselves.
Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Speechmatics has a quality rating of 1/5 (Poor). Users had to repeat themselves frequently. Quality unacceptable for production use.
Both providers are viable options. Speechmatics: Amazingly fast but Arabic quality is too poor for production. The speed advantage is meaningless when users have to repeat themselves. Mistral Voxtral Mini: Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Speechmatics starts at $0.0042 per minute (Real-time streaming). Mistral Voxtral Mini starts at Usage-based per request (Mistral API pricing).