Central Institute of Indian Languages [CIIL] MISSION STATEMENT:  Annotated, quality language data (both-text & speech) and tools in Indian Languages to Individuals, Institutions and Industry for Research & Development - Created in-house, through outsourcing and acquisition..  Our Other Sites  Related Sites 
You are here: BACK
Committees > Project Advisory Committee > Second Meeting - minutes
Second Meeting - minutes


I. Welcome
Prof. Udaya Narayana Singh, Director, Central Institute of Indian Languages and Chairperson, Linguistic Data Consortium for Indian Languages (LDC-IL) welcomed the Members for the Second Project Advisory Committee meeting. He explained the constraints for not having another one meeting during the past financial year.  The Director also took this opportunity to brief the members about the new projects sanctioned by the Ministry, especially about the National Translation Mission, which will have a bearing on the LDC-IL Project.

II. Agenda Items
After the Welcome, the agenda items were taken up in the order.
1. The Minutes of the First Project Advisory Committee Meeting of the Linguistic Data Consortium for Indian Languages (LDC-IL) held on June 5, 2007 were confirmed.

2. The Mission Statement of the LDC-IL namely, the following was approved: “Annotated, quality language data (both-text and speech) and tools in Indian languages to Individuals, Institutions Industry etc., for Research and Development - Created in house, through outsourcing and, acquisition”.

3. Dr. B. Mallikarjun, Reader cum Research Officer & Head, LDC-IL made a presentation on Action Taken Report on the recommendations of the First PAC meeting, progress made in the work of LDC-IL from June 5, 2007 to June 8, 2008 and proposed certain targets for the year 2008-09.  The proposed targets and the details of progress made are given in Annexure – 2 in a tabular form. 

III. Action Taken Report in respect of Working Groups

(a) The Working Groups on Licensing Issues, Natural Language Processing, Speech and Speech Deficiencies have met on August 3, 2007 at Pune,                        August 6, 2007 at Hyderabad, and November 29, 2007 at Mysore respectively and deliberated various issues relating to LDC-IL.  The Working Group on Licensing Issues is expected to have another meeting to make specific and concrete recommendations.  The Natural Language Processing Group has standardized POS Tag set and XML tag set (given in Annexure - 3).  The group has assigned tasks to members of the group to think and provide write ups on future directions. The group will study the drafts prepared by the individuals in order to prepare a document on future directions.

(b)  The Scholars of the Speech Group have met several times and arrived at standards for Speech Data capturing and Annotation. The Speech/Language Development Group has met and some of the personnel had sent their projects to the LDC-IL for grant in aid.  However, they have been asked to recast the same. 

(c) The Character Recognition Group could not meet due to various reasons.   Prof. B.B. Chaudhuri, the Chairman of the Group said that he had informal discussion with members of the group and that they would try to give the scanned texts for LDC-IL.  It was also agreed that this Working Group will meet on the sidelines of the next PRSG of

the MCIT Meeting to make specific recommendations regarding the tasks to be undertaken by the LDC-IL in this area.

(d) The following Standards for Language Data were presented, and discussed.  They are accepted.


Text Corpora

  • Text in UNICODE
  • Markup: SGML standard
  • POS tagging: Extendable and expandable decided by the NLP Group on August 6, 2007.

Speech Corpora

  • Rate of sampling - Multiples of 8 kHz. The purpose and the rate of sampling to be uniform.
  • Transliteration scheme- LDC-IL standard
  • Annotation - PRAAT, Wave surfer
  • Pronunciation Dictionary – Format (All placed before the PAC)

(For the convenience of the PAC members absent in the PAC the full text on standards for speech is enclosed with this as Annexure - 4).

(e) The Copies of the First versions of the Training Modules prepared by Prof. Dipti Misra et al., on POS Tagging and Chunking, Prof. Amba Kulkarni on Morphological analyzer, Prof. Pushpak Bhattacharya on Sense tagging and  Prof. Peri Bhaskararao on Collecting Speech data were given to  the Project Advisory Committee.  These modules will be used by the LDC-IL for in-house training as well as for the training that it will conduct elsewhere for collecting and annotating language data. 

(f)  It was noted that the following programmes conducted and sponsored by LDC-IL along with the reports were placed in the PAC meeting:

  • Winter School on Speech and Audio Processing (WiSSAP 2008) from 2nd to 5th January 2008 held at IIT, Madras.
  • Workshop on ‘Advanced Course in Computational Linguistics’ held from 16th to 25th March 208 by the Dravidian University, Kuppam.
  • Workshop on ‘Speech Sciences’ held at CIIL, Mysore from 10th - 21st March 2008.
  • Selection Workshop/Test conducted to recruit staff for the LDC-IL Project from 17th - 21st March 2008 at CIIL, Mysore.
  • LDC-IL staff training May 19 to June 13, 2008.


(g) A list of tasks that were recommended by the First Project Advisory Committee but not conducted by the LDC-IL was also provided to the PAC. 

(h) In the absence of specifically appointed staff for the LDC-IL, the institute using the Workshop mode resource persons has created monolingual text corpora, parallel corpora and speech corpora.  In doing so only availability of Resource Persons was taken into consideration and language priority was not considered.  The statistical details were presented before the Committee as a part of the progress report.

IV. Recommendations

The members deliberated these and made the following recommendations:

  • The LDC-IL has a national role and has to function its assigned responsibility as a nodal agency.  Therefore, the Linguistic Data Consortium for Indian Languages (LDC-IL) could be visualized as a repository of language/linguistic resources and tools for Indian languages.  An attempt should be made by the LDC-IL to contact all the NLP groups and institutions and collect them.  After collecting they have to be tested and validated.  The resources and tools that are up to the standard can be licensed by the LDC-IL under its Licensing Policy.
  • A Road Map for the work of LDC-IL for the current year as well as for the Eleventh plan period has to be drawn and placed before the next PAC meeting.  A priority list of languages has to be prepared for creation of resources.
  • Regarding all Speech for speech corpus of languages the number of hours has to be 20 hrs. and not 10 hrs. 
  • While preparing and procuring data, end users needs have to be kept in mind.  The focus has to be end user.
  • For evaluation process for each kind of data, tool etc.,  matrixes have to be evolved.  Bench marking, good standards etc., have to be developed. 
  • Indian Sign Language Vocabulary has to be developed. 
  • Dr. Anupam Basu of IIT Kharagpur be co-opted for NLP group.


V. Other Matters

(a) The existing 10 vacant positions will be filled by academic persons and they will be re-designated as Research Assistants (Senior) and Research Assistants (Junior).  The existing academic persons under Technical positions will also be re-designated as Research Assistant (Senior).

(b) The member representing IBM Shri Abhijit Dutta said that the Workshop on Speech Recognition they intend to do in the previous financial year shall be conducted in the next few months.

(c) The members present were requested to send proposals for Seminars, events, workshops, training programmes etc.

(d) Working Group on Speech Deficiency will be renamed as Working Group on Speech Language Development.

(e) Grant-in-Aid  : In case if some grantee does not show adequate progress  to commensurate with the release of funds, further release of funds will be stopped and action will be taken as per the agreement.

(f) The next meeting of the LDC-IL PAC will be held at Mysore in the last week of November 2008. All the members shall keep this in mind.

The meeting ended with thanks to the Chair.

Chairperson & Director, Linguistic Data 
Consortium for Indian Languages,
Central Institute of Indian Languages, Mysore


Annexure - I  

Minutes of the Second Project Advisory Committee Meeting of the Linguistic Data Consortium for Indian Languages (LDC-IL) held on June 9, 2008 at 11.30 a.m. under the Chairmanship of Prof. Udaya Narayana Singh, Director, Central Institute of Indian Languages, Mysore.


1.  Prof. Udaya Narayana Singh
Central Institute of Indian Languages
Manasagangotri, Hunsur Road,
Mysore - 570 006
2.  Mrs. Rashmi Chowdhary
Director (L)
Ministry of Human Resource Development                             
Department of Higher Education, Desk IV (L), Representing Language
Shastri Bhawan, `C' Wing, Bureau
New Delhi - 110 001.
3.  Shri S. Mohan
Finance Division
Ministry of Human Resource Development                         
Department of  Higher Education
Representing Language
Shastri Bhavan, 'C' Wing, Bureau
NEW DELHI - 110 001.
4.  Director                                                                      
Indian Institute of Technology                                               
Bombay, P.O. IIT, Powai,                                    
MUMBAI – 400 076                                               
Member Represented by
Dr. Pushpak Bhattacharya
Dept. of Computer Science
5.  Director                                                                          
Indian Institute of Technology                                 
Madras, I.I.T Post Office                                       
Chennai - 600 036.                                             
Member Represented by
Prof. Hema Murthy
Dept. of Computer Science
6.  Director                                                                           
Indian Institute of Technology Kharagpur,                   
KHARAGPUR – 721 302.                                          
Member Represented by
Prof. Anupam Basu
Dept. of Computer Science
7.  Director, C-DAC, Pune                                                        

Member Represented by
Dr. Hemant Darbari
Programme Coordinator,  AAI Group
8. Prof. Vijayalakshmi Basavaraj
All India Institute of Speech & Hearing                         
Manasagangotri, Mysore – 570 006  
9. Prof. C.N. Krishnan
AU-KBC Research Centre for Internet & Telecom              
Technology, Anna University MIT Campus
Chromepet, Chennai – 600 044
10. Prof. B.B. Chaudhuri
Head, CVPR Unit
Project Coordinator, MIT Resource Centre for Bangla,
Indian Statistical  Institute, Kolkata,                                 
203 Barrackpore Trunk Road,
KOLKATA -  700 035, West Bengal
11. Prof. Peri Bhaskararao                                               
Tokyo University of Foreign Studies                                 
Speech Sciences, ILCAA, Tokyo, Japan                                          
12. Ms. Rekha Sharma                                                       
Centre for Speech Sciences                                            
Central Institute of Indian Languages       
Manasagangotri, Hunsur Road,
Mysore - 570 006
13. MICROSOFT                                                                  
DLF Cybergreens, 9th Floor                                             
Tower A, Sector 25 A                                               
GURGAON – 122 002,
Member Represented by
Dr. Kalika Bali
14. MOTOROLA                                                               
Lake View Building                                                         
Bagmane Tech Park                                                  
C.V. Raman Nagar,                                             
BANGALORE – 560 093           
Member Represented by
Shri Shailesh Ramamurthy
15. GOOGLE INDIA                                                             
No. 3, RMZ Infinity - Tower E                                           
Old Madras Road, 4th Floor,                                      
Bangalore - 560 016                                                 
Member Represented by
Dr.  Prasad Ram
16   IBM                                                                              
DLF Silokhera, NH –8                                             
Sector – 30, GURGAON – 122 002,                           
Member Represented by
Shri Abhijit Dutta
Globalization Specialist
17. Dr. B. Mallikarjun
Reader cum Research Officer
Central Institute of Indian Languages
Hunsur Road, Mysore - 570 006
Member-Convener Head, LDC-IL



1.  Director
Indian Institute of Science Bangalore
Bangalore - 560 012
2.  Director
International Institute of Information                               
Technology Hyderabad
Gacchibowli, Hyderabad - 500 019
3.  Joint Secretary                                                             
Human Centered Computing Division, Member              
TDIL, Deptt. Of Information Technology,                
Ministry of Communication & Information                
Technology, Room No. 3009,
Electronics Niketan,
6, CGO Complex, New Delhi – 3
4.  Vice Chancellor & Professor of Law
National Law School University
P.O. Box No. 7201, Nagarbhavi,
Bangalore - 560 072
5.  Prof. G. Umamaheshwar Rao
Centre for Applied Linguistics & Translation Studies          
(CALTS), University of Hyderabad                              Special Invitee
P.O. Central University Campus
Hyderabad – 500 046, A.P.
6.  Prof. Yagnanarayana
International Institute of Information   
Technology Hyderabad
Gacchibowli, Hyderabad – 500 019.
7.  Director of Science & Technology
Hewlett-Packard Labs. India
24, Salarpuria Arena
Hosur Main Road, Adugodi
Bangalore – 560 030
8.  Executive Director
(Manufacturers’ Association for                                    
Information Technology)                                           
PHD House, 4th Floor,                                                
Opposite Asian Games Village,
NEW DELHI – 110 016.   


Annexure - II
TARGETS/PROPOSALS 2008-09 The following were presented as the tasks for the current financial year:

1.   Five million word Corpus in 6 Languages: Assamese, Bengali, Gujarati, Manipuri, Nepali and Tamil.


Ten hours of Recording of Speech for Speech Corpus in languages: Assamese, Bengali, Gujarati, Kannada, Manipuri, Nepali and Tamil.



Procuring of Speech Corpora to the tune of 50 hours in 8 languages.


Multilingual dictionaries in 5 Languages.


Frequency Dictionaries in 6 Languages.


Creation of Pronunciation Dictionary in 5 Languages.


Development of tools for analysis of the Text and Speech     Corpus in Indian Languages.


Conduct of Project Advisory Committee Meetings (2), National, Regional level Training Programmes/ Workshops/Meetings and Conferences (22).


Conduct of One International Seminar/Conference.



The Institute has already agreed to be one of the sponsors of ICON 2008 (6th International Conference on Natural Language Processing) to be held at CDAC, Pune, India from December 20-22, 2008.  It also made a Commitment to the   SLT at Goa from December 15-19, 2008.


Conduct of faculty improvement programme for the staff of LDC-IL (4).


Giving Grants for the creation of language/lexical resources in Indian Languages.
Visitor Counter


Developed & Maintained by:
Copyright © LDC-IL,
Central Institute of Indian Languages
Central Institute of Indian Languages
Department of Higher Education
Ministry of Education
Government of India
Manasagangothri, Hunsur Road, Mysore-570006, Karnataka, India.
Tel: (0821) 2515820 (Director)
Reception/PABX : (0821) 2345000
Fax: (0821) 2515032 (Off)
        Home | Announcements | News | CIIL | Contact Us