Central Institute of Indian Languages [CIIL] MISSION STATEMENT:  Annotated, quality language data (both-text & speech) and tools in Indian Languages to Individuals, Institutions and Industry for Research & Development - Created in-house, through outsourcing and acquisition.  Our Other Sites  Related Sites 
You are here: BACK
HOME
Introduction

Language data is the key ingredient in terms of research and development in the area of language technology. As the time goes by, an increasing number of researchers are seeing the potential benefits of the use of an electronic corpus as a source of empirical language data for their research. The issues surrounding collection, processing and annotation of the quantities of linguistic data make it necessary to involve a number of disciplines like linguistics, computer science, statistics, engineering etc. Corpus linguists, as we all know, often use computational methods when analyzing their data whereas the computational linguists are dependent on computer-readable linguistic data to use in their research and in building practical tools and programmes. The data from a large number of Indian languages thus collected will be of high quality with defined standards. This has been on demand for a long time in India which will now come true.

In order to fulfill this long-pending need, the Central Institute of Indian Languages, Mysore and several other like-minded institutions working on Indian Languages technology like Indian Institute of Science, Bangalore, Indian Institute of Technology, Bombay, Indian Institute of Technology, Madras, and the International Institute of Information Technology, Hyderabad, etc., have now been allowed by the Government of India to set up a Linguistic Data Consortium for Indian Languages (LDC-IL). The scheme has now been approved by the Planning Commission.

This consortium, being set up in the lines of the LDC at the University of Pennsylvannia (USA), will not only create and manage large Indian languages databases, it will also provide a forum for researchers in India and other countries working on Indian languages to publish and build products for use based on such databases that would not otherwise be possible.

LDC-IL is expected to:

  • Become a repository of linguistic resources in all Indian languages in the form of text, speech and lexical corpora.
  • Facilitate creation of such databases by different organizations which could contribute and enrich the main LDC-IL repository.
  • Set appropriate standards for data collection and storage of corpora for different research and development activities.
  • Support language technology development and sharing of tools for language-related data collection and management.
  • Facilitate training and manpower development in these areas through workshops, seminars etc. in technical as well as process related issues.
  • Create and maintain the LDC-IL web-based services that would be the primary gateway for accessing its resources.
  • Design or provide help in creation of appropriate language technology based on the linguistic data for mass use and
  • Provide the necessary linkages between academic institutions, individual researchers and the masses.

The following areas of Natural Language Processing will be immediately benefited from the LDC-IL and related activities:

  • Speech Recognition and Synthesis
  • Character Recognition
  • Corpora Creation in Indian Languages, and Several by-products like lexicon, thesauri, word-finder, concordances, etc

The services under the LDC-IL are hosted and managed by the Central Institute of Indian Languages, Mysore. A differential rate of annual fee will be charged from the users of the services. It is expected that the scheme will progress towards self-sustenance in due course of the current plan period of the Government of India during the XIth Five-Year Plan.

  
WHAT is NEW?

1. Recruitment of Junior Resource Person

2. Corpus Normalization Workshops

3. Sign Language Corpus Development: Academic and Technical Issues Workshop

TOP BACK
You are visitor No.
WAIT...

Developed & Maintained by:
LDC-IL, CIIL
Copyright © LDC-IL,
Central Institute of Indian Languages
Central Institute of Indian Languages
Department of Higher Education
Ministry of Human Resource Development
Government of India
Manasagangothri, Hunsur Road, Mysore-570006, Karnataka, India.
Tel: (0821) 2515820 (Director)
Reception/PABX : (0821) 2345000
Fax: (0821) 2515032 (Off)
        Home | CIIL | News | Announcements | Contact Us