LDC-IL (Linguistic Data Consortium for Indian Languages)

Central Institute of Indian Languages [CIIL]

MISSION STATEMENT: Annotated, quality language data (both-text & speech) and tools in Indian Languages to Individuals, Institutions and Industry for Research & Development - Created in-house, through outsourcing and acquisition.

Our Other Sites

Related Sites

You are here :

Upcoming Events > Knowledge Sharing Event on POS Tagging

Knowledge Sharing Event - 4

POS Tagging

25 - 26 March, 2010

Language technology work carried out in Indian languages till date, including dictionaries, thesauri, wordnets, morphological analyzers and generators, spell checkers, POS tagging, is primarily at the level of words. Sentences show rich and varied structures and it is very important to be able to analyze, capture and utilize the syntactic structure of natural language sentences. Going beyond words and handling relevant linguistic phenomena at the level of sentences is essential for advanced language technology applications such as automatic translation, question-answering, automatic summarization etc. A computational grammar must be so precise that a computing machine can mechanically apply the grammar for parsing and generation.

In this context, the challenge is to develop computational grammars that do not require commonsense or world knowledge. Also, grammars meant for human users normally talk only about irregular cases, exceptions etc., assuming that the readers already know all the usual basic rules. Unlike this, a computational grammar must be comprehensive and should include extensive and thorough knowledge of all possible real-world grammatical sentences, both simple and complex.

Developing computational grammars involve the following tasks and subtasks which can be accomplished in phases:

Task 1: POS tagging
Task 2: Building Electronic Dictionary
Task 3: Developing Morphological analyzer and generator
Task 4: Semantic tagging
Task 5: Building Chunker
Task 6: Tree banking
Task 7: Shallow and Deep parsing.

The LDC-IL, in this context is organising a series of events to bring together researchers working in these areas, to share their knowledge with other researchers. Each of the events in the series will focus on one aspect of computational grammar at a time and thereby address issues related to these themes.
Parts-of-speech (POS) tagging is the basic building block of any NLP work, hence, the first step in developing the computational grammar for any language. The POS tagging is not about just providing a tag to a token but it encompasses a whole range of grammatical information for that token in the sentence from a particular language. Besides, designing the tagset for a specific NLP purpose, preparing annotation guidelines and interannotator agreement are also very important. It is an active research area in NLP.

POS tagging is useful in speech generation, speech recognition, parsing, machine translation, information retrieval, information extraction, WSD (word sense disambiguation), question-answering etc. Moreover, it is an intermediate step for higher NLP tasks such as parsing, semantic analysis and machine translation. Especially regarding Indian languages, POS tagging adds many more dimensions, involving many complex issues that arise out of script, writing convention specific to the language, inherent POS ambiguities as well as with respect to unknown tokens and variations in the text.

POS tagging can be both manual as well as automatic. Manual tagging, though more accurate, is a time-consuming, long and continuous process. Hence, the automatic tagger is essential to speed up the process of POS tagging with less chance of errors and inconsistencies. Various automatic POS taggers have been developed worldwide using linguistic rules, stochastic models and hybrid taggers (a combination of both). Different kinds of taggers have certain advantages as well as disadvantages. Automatic tagging is a challenge for Indian languages which are highly inflectional and morphologically rich. Hence, the development of high accuracy POS taggers is a challenging task.

The event endeavours to concentrate on the challenges in the area of POS tagging and to share and investigate novel advancements in the area specific to a language as well as cross-linguistically. It also aims to provide researchers an academic space to discuss the most recent information about theories, tools and techniques explored for POS tagging, and focuses on linguistic as well as computational issues related to it. The event intends to endorse the area of POS tagging as a consolidated research area and aims towards widening its horizon.

Abstracts of no more than 500 words (including references) mentioning the title of the paper, author(s) name, institutional affiliation, and email address should be submitted by the due date in *.rtf or *.pdf format to the following email address:
rsrishti@gmail.com or ldc-richa@ciil.stpmy.soft.net

Important Dates

Last date for submitting the Abstract         :     30th November, 2009
Abstract acceptance notification                :     20th December, 2009
Last date for submitting full length Papers  :     1st February, 2010

Minimal financial support for travel shall be provided to the authors of a few selected papers.

Event Schedule

View/Download Presentations

Back

Top