Arabic Speech-to-Text Comparison

Mistral Voxtral MinivsGoogle Cloud STT — Chirp 3

Head-to-head comparison based on real production benchmarks with Gulf Arabic callers.

Overview

Mistral Voxtral Mini

Non-functional

Mistral's speech model — completely non-functional for Arabic.

production testedvoxtral-mini-latest

Google Cloud STT — Chirp 3

Acceptable

High-quality Arabic STT from Google Cloud, but with significant latency.

production testedchirp-3

Latency

Mistral Voxtral Mini

Avg EOU Delay
N/A
Best Case
N/A
Worst Case
N/A

Google Cloud STT — Chirp 3

Avg EOU Delay2376ms
Best Case2000ms
Worst Case3000ms
Full turn time: 9000ms–10000ms

Quality

Mistral Voxtral Mini

Non-functional

Produced zero transcriptions for Arabic audio. Tested with and without explicit language parameter.

Google Cloud STT — Chirp 3

Excellent
WER: 28.8%

High quality transcription. Broad Arabic dialect support through ar-XA language code.

Gulf ArabicMSAEgyptianLevantine

Features

FeatureMistral Voxtral MiniGoogle Cloud STT — Chirp 3
Multilingual speech recognition (claimed)
Audio understanding
Real-time streaming transcription
120+ language support
Automatic punctuation
Word-level timestamps
Speaker diarization
Custom vocabulary
Medical and telephony models

Pricing

Mistral Voxtral Mini

Free tier
APIMistral API pricing
Usage-basedper request

Google Cloud STT — Chirp 3

Free tier
StandardChirp 3 model
$0.016per 15 seconds

Streaming & Integration

CapabilityMistral Voxtral MiniGoogle Cloud STT — Chirp 3
Streaming support
LiveKit plugin
Self-hostable
API styleRESTgRPC streaming + REST
SDKsPython, Node.jsPython, Node.js, Go, Java, C#, Ruby, PHP

Verdict

Non-functional

Mistral Voxtral Mini

Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support.

Choose Mistral Voxtral Mini if you need:

    Pros
    • +Part of Mistral ecosystem
    Cons
    • -Completely non-functional for Arabic
    • -Zero output despite audio processing
    • -Misleading multilingual claims
    Acceptable

    Google Cloud STT — Chirp 3

    Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical.

    Choose Google Cloud STT — Chirp 3 if you need:

    • Batch transcription
    • Multi-dialect Arabic support
    • Enterprise compliance
    Pros
    • +Excellent transcription quality
    • +Broadest Arabic dialect support
    • +Enterprise-grade reliability
    • +Extensive SDK ecosystem
    Cons
    • -2.4s average EOU delay — too slow for voice agents
    • -Higher pricing than competitors
    • -Complex GCP setup required

    Frequently Asked Questions

    Which has better Arabic transcription quality, Mistral Voxtral Mini or Google Cloud STT — Chirp 3?

    Google Cloud STT — Chirp 3 has a quality rating of 5/5 (Excellent). High quality transcription. Broad Arabic dialect support through ar-XA language code.

    Is Mistral Voxtral Mini or Google Cloud STT — Chirp 3 better for production voice agents?

    Both providers are viable options. Mistral Voxtral Mini: Does not work for Arabic at all. Zero transcriptions produced in testing despite claiming multilingual support. Google Cloud STT — Chirp 3: Excellent quality but too slow for real-time voice agents. Best suited for batch transcription or applications where latency isn't critical.

    How does Mistral Voxtral Mini pricing compare to Google Cloud STT — Chirp 3?

    Mistral Voxtral Mini starts at Usage-based per request (Mistral API pricing). Google Cloud STT — Chirp 3 starts at $0.016 per 15 seconds (Chirp 3 model).