India turns to AI to capture its 121 languages

The article discusses the efforts to build AI-based chatbots in India's native languages, such as Kannada, to bridge the language gap and provide useful information and economic opportunities to millions of Indians. Tech firm Karya is building datasets for companies like Microsoft and Google, while the government is creating language datasets through Bhashini, an AI-led language translation system. The data is being generated by thousands of speakers of different Indian languages, and crowdsourcing initiatives are being used to collect and validate the data.

The article also highlights the economic value of this initiative, with Karya paying workers above the minimum wage to generate data. The data they generate also allows them to earn royalties and has the potential to build AI products for the community in areas such as healthcare and farming. The article concludes by mentioning specific AI models and projects that focus on speech and speech recognition, such as Google-funded Project Vaani and the AI4Bharat centre's Jugalbandi, an AI-based chatbot that can answer questions on welfare schemes in several Indian languages.

Key takeaways

Villagers in Karnataka, India, are contributing to a project to build the country's first AI-based chatbot for Tuberculosis by providing speech data in their native Kannada language.
Despite India's linguistic diversity, few of its languages are covered by natural language processing (NLP), excluding hundreds of millions of Indians from accessing useful information and economic opportunities.
Indian tech firm Karya is building datasets for companies like Microsoft and Google to use in AI models for education, healthcare, and other services, using speech data from speakers of different Indian languages.
The Indian government is also building language datasets through Bhashini, an AI-led language translation system that is creating open source datasets in local languages for creating AI tools.

India turns to AI to capture its 121 languages

Key takeaways

Discussion (0)