Voice of India ASR Benchmark Reveals Global Systems' Struggles with Indian Languages

February 16, 2026
Voice of India ASR Benchmark Reveals Global Systems' Struggles with Indian Languages
  • Voice of India is a national ASR benchmark evaluating 15 Indian languages using spoken real-world data from over 35,000 speakers, focusing on how well systems recognize speech across languages and dialects.

  • Developed by Josh Talks in collaboration with AI4Bharat at IIT Madras, the benchmark reveals a sizable performance gap between India-focused models and several global systems, especially for regional languages and dialects.

  • The project aims to assess Indian languages under real-world conditions and to push for better accuracy across linguistic diversity.

  • Meta’s 7B model shows only about 4% more accuracy than its 1B model across Indian languages, suggesting limited efficiency gains from scaling in this domain.

  • Global players struggle with regional Indian languages; Meta’s Tamil and Malayalam error rates can be two to three times higher than rivals like Sarvam and Google in some cases.

  • AI4Bharat argues current word error rate metrics can unfairly penalize code-mixed and multilingual speech, so the dataset includes curated spelling variants to focus on linguistic correctness rather than orthography.

  • Dravidian languages such as Tamil, Telugu, Malayalam, and Kannada exhibit higher error rates than Indo-Aryan languages, with dialects like Bhojpuri showing 20–30% word error rates versus under 10% for standard Hindi.

  • Sarvam Audio models consistently rank near the top across languages, while OpenAI models show substantial accuracy gaps, with some comparisons exceeding 50 percentage points in average accuracy.

  • The benchmark highlights voice as critical infrastructure for banking, healthcare, and government services, where high WER can misroute welfare applications, mis-transcribe medical symptoms, or misdirect user queries.

  • OpenAI’s transcription models perform poorly on Indian languages like Maithili and Tamil, with WERs over 55%, signaling real-world usability challenges.

  • Microsoft STT lacks support for six of the 15 tested languages, limiting its applicability in India.

  • The testing protocol covers about 2,000 speakers per language with district-level sampling and manual spelling variant curation to capture regional variation and code-switching.

Summary based on 3 sources


Get a daily email with more AI stories

Sources

Global speech AI struggles to understand India: Report


Global speech AI struggles to understand India: Report

More Stories