Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.
Mistral's speech model — completely non-functional for Arabic.
High-quality Arabic STT from Google Cloud, but with significant latency.
Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.
High quality transcription. Broad Arabic dialect support through ar-XA language code.
| Feature | Mistral Voxtral Mini | Google Cloud STT — Chirp 3 |
|---|---|---|
| Multilingual speech recognition (claimed) | ✓ | ✗ |
| Audio understanding | ✓ | ✗ |
| Real-time streaming transcription | ✗ | ✓ |
| 120+ language support | ✗ | ✓ |
| Automatic punctuation | ✗ | ✓ |
| Word-level timestamps | ✗ | ✓ |
| Speaker diarization | ✗ | ✓ |
| Custom vocabulary | ✗ | ✓ |
| Medical and telephony models | ✗ | ✓ |
| Capability | Mistral Voxtral Mini | Google Cloud STT — Chirp 3 |
|---|---|---|
| Streaming support | ✗ | ✓ |
| LiveKit plugin | ✗ | ✓ |
| Self-hostable | ✗ | ✗ |
| API style | REST | gRPC streaming + REST |
| SDKs | Python, Node.js | Python, Node.js, Go, Java, C#, Ruby, PHP |
Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical.
Google Cloud STT — Chirp 3 has a quality rating of 5/5 (Excellent). High quality transcription. Broad Arabic dialect support through ar-XA language code.
Both providers are viable options. Mistral Voxtral Mini: Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support. Google Cloud STT — Chirp 3: Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical.
Mistral Voxtral Mini starts at Usage-based per request (Mistral API pricing). Google Cloud STT — Chirp 3 starts at $0.016 per 15 seconds (Chirp 3 model).