Search

Cookies

We use cookies to improve your experience. By continuing, you accept our use of cookies.

Technology

Gnani AI's Prisma v2.5 Tops Sarvam, ElevenLabs in Indian Speech Recognition

· · 3 min read

Bengaluru-based Gnani AI has launched Prisma v2.5, its latest speech-to-text model. The company claims it outperforms competitors like ElevenLabs and Sarvam AI in Indian language recognition, achieving lower word error rates across various dialects and noisy environments.

Bengaluru-based voice artificial intelligence startup Gnani AI has unveiled Prisma v2.5, its advanced speech-to-text model. The company asserts that this new iteration significantly surpasses rival models from Sarvam AI, ElevenLabs, and Deepgram in accuracy for Indian languages, particularly in real-world, acoustically challenging environments.

Setting New Benchmarks for Indian Languages

Gnani AI claims Prisma v2.5 achieved the top rank in eight out of nine Indian languages across various speech-recognition benchmarks, including the specialized Gramvaani dataset which captures diverse speech patterns from semi-urban and rural India. The model reportedly achieved a 15% reduction in word error rates for rural Hindi dialects and an 18% reduction across Dravidian languages when compared to competing systems.

Traditional speech recognition systems often struggle with common Indian linguistic nuances such as regional accents, background noise, compressed audio from telephone calls, and the prevalent practice of code-switching between English and Indian languages. Gnani AI developed Prisma v2.5 specifically to address these real-world complexities.

Advanced Training and Enterprise Applications

The development of Prisma v2.5 involved training on an extensive dataset of 14 million hours of proprietary speech data, encompassing 12 languages. This training corpus was meticulously designed to include regional dialects, various ambient noise conditions, and instances of code-switching. According to Gnani AI, this comprehensive training enables the model to perform robustly where others falter.

The primary target for Prisma v2.5 is enterprise clients in sectors such as banking, financial services and insurance (BFSI), healthcare, and general insurance. In these industries, precise transcription is crucial for maintaining compliance records, optimizing customer relationship management (CRM) systems, and enhancing agent-assistance tools, where errors in names, numbers, or technical terms can have significant repercussions.

Seamless Code-Switching and Audio Handling

A key feature of Prisma v2.5 is its native support for code-switching, allowing seamless transitions between languages like Hindi-English, Tamil-English, and other regional-English pairs at the word level, without requiring explicit language tags. Furthermore, the model is engineered to handle audio transmitted through various network types, including GSM and Voice-over-Internet-Protocol (VoIP).

“CODEC handling for GSM and VoIP is native. Code-switching across Hindi-English, Tamil-English, and regional-English pairs works at the word level without language tagging,” stated Bharath Shankar, co-founder and chief product and engineering officer at Gnani AI. Shankar also noted that post-training optimizations have doubled the model’s throughput without compromising accuracy.

Ganesh Gopalan, co-founder and chief executive officer of Gnani AI, highlighted the unique challenges of voice AI deployment in India. “Accents, noise, code-switching, compressed telephony audio, these are not edge cases in India; they are the norm,” Gopalan emphasized, underscoring the necessity for models specifically trained on the diverse ways Indians speak in everyday conditions.

Related