The article also highlights the economic value of this initiative, with Karya paying workers above the minimum wage to generate data. The data they generate also allows them to earn royalties and has the potential to build AI products for the community in areas such as healthcare and farming. The article concludes by mentioning specific AI models and projects that focus on speech and speech recognition, such as Google-funded Project Vaani and the AI4Bharat centre's Jugalbandi, an AI-based chatbot that can answer questions on welfare schemes in several Indian languages.
Key takeaways:
- Villagers in Karnataka, India, are contributing to a project to build the country's first AI-based chatbot for Tuberculosis by providing speech data in their native Kannada language.
- Despite India's linguistic diversity, few of its languages are covered by natural language processing (NLP), excluding hundreds of millions of Indians from accessing useful information and economic opportunities.
- Indian tech firm Karya is building datasets for companies like Microsoft and Google to use in AI models for education, healthcare, and other services, using speech data from speakers of different Indian languages.
- The Indian government is also building language datasets through Bhashini, an AI-led language translation system that is creating open source datasets in local languages for creating AI tools.