Linguistic Data Consortium for Indian Languages (LDC-IL)

Ministry of Education, Government of India

The Linguistic Data Consortium for Indian Languages is advancing in innovative data creation and comprehensive technological development. LDC-IL focuses on creating, distributing and analyzing linguistic resources that fuel advancements in natural language processing, language preservation and language applications. We span across wide range of research areas that related to language technologies and specific language oriented applications.

Research Areas

Natural Language Processing

NLP research addresses critical challenges in machine translation, transliteration, speech recognition, and text analysis. Current projects aim to create better language data for better understanding and generation of human language. LDC-IL developed efficient Machine translation, transliteration, grapheme to phoneme conversion and Text to speech systems.

Language Analysis

LDC-IL analyse the language data including syntax, semantics, and pragmatics to explore the human understanding. Text data is annotated at various levels, such as POS tagging, chunking and type-token analysis while speech data is annotated at the sentence level.

Linguistic Resources

LDC-IL is preparing by-products like frequency list, word list from Corpus. We try to make different kinds of datasets based on the application. Machine learning or deep learning models need large amounts of language data. So we are creating large amount of qualitative corpus in Indian languages to advancing the Indian language in the NLP field. LDC-IL focuses on Scheduled languages predominantly but creates corpus for non- scheduled languages to support and enhance.

Publications and Resources

LDC-IL researchers publish articles in top-tier journals, contributing valuable insights and advancements to the field. The LDC-IL distributes text and speech datasets and inventive tools to support linguistic research worldwide. You can find the research article and datasets in the publication page.

Future Pathways

The LDC-IL constantly strives to find new innovations in the field of language and linguistic research. LDC-IL continuously improves the data creation method to get qualitative data. Even though LDC-IL released the datasets but still try to add the under-represented domains in the text data and the unexploited dialects in the speech data. We are improving our Language analysis method and revising standards for linguistic resource development.