Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.
Mistral's speech model — completely non-functional for Arabic.
Ultra-fast Arabic STT with poor transcription quality.
Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.
Users had to repeat themselves frequently. Quality unacceptable for production use.
| Feature | Mistral Voxtral Mini | Speechmatics |
|---|---|---|
| Multilingual speech recognition (claimed) | ✓ | ✗ |
| Audio understanding | ✓ | ✗ |
| Real-time streaming transcription | ✗ | ✓ |
| Configurable endpointing | ✗ | ✓ |
| Standard and enhanced operating points | ✗ | ✓ |
| Custom dictionary | ✗ | ✓ |
| Capability | Mistral Voxtral Mini | Speechmatics |
|---|---|---|
| Streaming support | ✗ | ✓ |
| LiveKit plugin | ✗ | ✗ |
| Self-hostable | ✗ | ✓ |
| API style | REST | WebSocket streaming + REST |
| SDKs | Python, Node.js | Python, Node.js |
Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Amazingly fast but Arabic quality is too poor for production. The speed advantage is meaningless when users have to repeat themselves.
Mistral Voxtral Mini has a quality rating of 1/5 (Non-functional). Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.
Both providers are viable options. Mistral Voxtral Mini: Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support. Speechmatics: Amazingly fast but Arabic quality is too poor for production. The speed advantage is meaningless when users have to repeat themselves.
Mistral Voxtral Mini starts at Usage-based per request (Mistral API pricing). Speechmatics starts at $0.0042 per minute (Real-time streaming).