Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.
High-quality Arabic STT from Google Cloud, but with significant latency.
Mistral's speech model — completely non-functional for Arabic.
High quality transcription. Broad Arabic dialect support through ar-XA language code.
Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.
| Feature | Google Cloud STT — Chirp 3 | Mistral Voxtral Mini |
|---|---|---|
| Real-time streaming transcription | ✓ | ✗ |
| 120+ language support | ✓ | ✗ |
| Automatic punctuation | ✓ | ✗ |
| Word-level timestamps | ✓ | ✗ |
| Speaker diarization | ✓ | ✗ |
| Custom vocabulary | ✓ | ✗ |
| Medical and telephony models | ✓ | ✗ |
| Multilingual speech recognition (claimed) | ✗ | ✓ |
| Audio understanding | ✗ | ✓ |
| Capability | Google Cloud STT — Chirp 3 | Mistral Voxtral Mini |
|---|---|---|
| Streaming support | ✓ | ✗ |
| LiveKit plugin | ✓ | ✗ |
| Self-hostable | ✗ | ✗ |
| API style | gRPC streaming + REST | REST |
| SDKs | Python, Node.js, Go, Java, C#, Ruby, PHP | Python, Node.js |
Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical.
Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Google Cloud STT — Chirp 3 has a quality rating of 5/5 (Excellent). High quality transcription. Broad Arabic dialect support through ar-XA language code.
Both providers are viable options. Google Cloud STT — Chirp 3: Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical. Mistral Voxtral Mini: Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.
Google Cloud STT — Chirp 3 starts at $0.016 per 15 seconds (Chirp 3 model). Mistral Voxtral Mini starts at Usage-based per request (Mistral API pricing).