Skip to main content | Skip to Navigation | Text Size : | Language:

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Dr. Shantanu Kumar
Dr. Shantanu Kumar | Official Website of Linguistic Data Consortium for Indian Languages
Academic Qualification
  • B.A. English (Hons.), Mathematics, French, Sanskrit & Telugu, BHU, Varanasi
  • M.A. Linguistics. Dissertation: Named Entity Recognition in Maithili, BHU, Varanasi
  • M.A. English Literature IGNOU, New Delhi
  • NET Lectureship UGC, New Delhi
  • GATE Research Fellowship MHRD, New Delhi
  • PhD in Linguistics. Topic: "Automatic Speech Recognition in Maithili: Issues & Challenges",  Mysore University, Mysore. Research Centre: CIIL, Mysore
Trained in
  • POS Tagging
  • Annotation
  • Translation
  • Corpus Generation
  • Text Processing
  • Speech Processing (ASR & TTS)
  • Pronunciation Lexicon
Position held Junior Resource Person - II
Experience in research, training and documentation
  • Coordinator: Summer School in Computational Linguistics. CIIL, Mysore June 19-July 03, 2022.
  • Coordinator: National Seminar on Data Sampling in Angika, Bhagalpur, Bihar. Nov, 2022.
  • Coordinator: Field Work on Data Collection in Angika, Bhagalpur, Bihar. Nov 5th-13th, 2022.
  • Maithili Resource Person: LDC-IL, CIIL. Feb, 2021 - Aug, 2022.
  • Coordinator: International Symposium on Maithili, Darbhanga, Bihar. Dec, 2021.
  • Internship: Named Entity Recognition on Maithili, IIT-BHU, Varanasi. Jan, 2020 - June, 2020.
  • Internship: Hindi-Maithili Machine Translation Evaluation, IIT-BHU, Varanasi, July 2019 - Dec 2019.
Presented/Participation in professional conferences/seminars/ workshops

Workshop Organized

  • Summer School in Computational Linguistics at Central Institute of Indian Languages, Mysore. (June 19-July 03, 2023).
  • Field Work on Data Sampling in Angika. Organized by CIIL, Mysore, in Bihar. (Nov 2022)
  • National Seminar on Data Collection in Angika. Organized by CIIL, Mysore at Bhagwan Pustakalay, Bhagalpur, Bihar. (Nov 2022)
  • International Symposium on Maithili. Organized by CSTS, Delhi, and CIIL, Mysore at KSDSU, Darbhanga, Bihar. (Dec 2021)

Workshop/Conferences Presented/Attended 

  • Conference on Evaluation and Benchmarking of AI Applications in Indian Languages organized by the LDC-IL, Central Institute of Indian Languages, Mysore. March 20-21, 2025.
  • 1st International Conference on Digital Humanities in the Anthropocene organized by Amity Institute of English Studies and Research, AUUP. January 15-17, 2025.
  • Workshop on Strengthening Samajik Chetna Kendra in Regional Context organized by Cell for National Center for Literacy, NCERT & LDC-IL at Central Institute of Indian Languages, Mysore from 26th to 27th September, 2024.
  • Conference on Bharatiya Languages and India as One Linguistic Area held at Central Institute of Indian Languages, Mysore from 28th to 29th May, 2024.
  • National Workshop on Creation of Thesaurus and A Standard Dictionary for the Gondi Language organized by the Central Institute of Indian Languages, Mysuru from 21st to 27th February 2024
  • Conference on Technology and Bharatiya Bhasha Summit jointly organized by CIIL, UGC, AICTE, NETF, BBS, NCERT, NCVET, at Dr. Ambedkar International Centre, Janpath, New Delhi on 30th September and 1st October, 2023.
  • Summer School on Language Documentation organized by SPPEL, CIIL, Myosre (May 17-31, 2022)
  • Training Program on Language Documentation organized by SPPEL, CIIL, Mysore (March 21-29, 2022)
  • FDP on Python Programming for Beginners using Artificial Intelligence and Machine Learning at NIT-Warangal(Oct 2021-Nov 2021)
  • FDP on Introduction to Speech Processing and its Applications using AI-ML (ISPA) at CDAC-Kolkata (October 25th - October 29th, 2021)
  • FDP on Recent Advancements in Automatic Speech Recognition and Speaker Verification at NIT-Sikkim (September 27th -October 1st, 2021)
  • Summer School on Automatic Speech Recognition organized at IIT-Dharwar with IIIT-Dharwar, Karnataka (July 19th - July 30th, 2021).
  • Workshop on Indian Language Data: Resources and Evaluation, on May 24, 2020, CIIL, Mysore
  • International Webinar Series on Linguistics, Team Bhasha-Chintan, Banaras Hindu University (06/06-25/07/2020)
  • National Workshop on Digitization and Development of E-Resources for Sanskrit (27-31 May) (Jointly organized by Jawaharlal Nehru University and Delhi University)
Publications

Research Paper

  • Kumar, S., Choudhary, N. Metathesis in Maithili: A Corpus-based Case Study of vowel 'I'. 2025. Speech Communication (Under Review). Print ISSN: 0167-6393 | Online ISSN: 1872-7182. (SCOPUS)
  • Kumar, S., Choudhary, N. Maithili Variation from Vowel Acoustics. 2024. Mithila Bharati. Volume XI(I-IV). Page. 206-226. ISSN 2349-834X. (UGC-CARE)
  • Kumar, S., Choudhary, N. Spelling Standardization issues in Indian Languages. 2023. Bhasha e-Journal. Issue 307. ISSN 0523-1418. (UGC-CARE)
  • Mundotiya, R., Kumar, S., Kumar, A., Chaudhary, U., Chauhan, S., Mishra, S., Singh, A. K. (2023). Development of a Dataset and a Deep Learning Baseline Named Entity Recognizer for Three Low-Resource Languages: Bhojpuri, Maithili, and Magahi. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(1), 1-20. (SCOPUS)
  • Satisha, M., Kumar, S., Vinay, A. (2022) An Overview of Idioms of Mundari Language Spoken in Jharkhand. Cosmos Multidisciplinary Research E-Journal Volume VII Issue VI June 2022. Paper (Peer Review)

Datasets

  • Shantanu Kumar, Dinesh Mishra, Saurabh Varik, Narayan Kumar Choudhary, Shailendra Mohan. 2025. Maithili Text to Speech Corpus. Central Institute of Indian Languages, Mysore. 978-93-48633-36-1.
  • Shantanu Kumar, Ankita Tiwari, Rajesha N., Narayan Kumar Choudhary, Shailendra Mohan. 2025. Maithili Raw Speech Corpus Vol. II. Central Institute of Indian Languages, Mysore. 978-93-48633-37-8.
  • Shantanu Kumar, Ankita Tiwari, Manasa G., Narayan Kumar Choudhary, Shailendra Mohan. 2025. A Gold Standard Maithili Raw Text Corpus Vol. II. Central Institute of Indian Languages, Mysore. ISBN: 978-93-48633-01-9.
  • Dinesh Mishra, Shantanu Kumar, Narayan Kumar Choudhary, Rajesha N, Shailendra Mohan. 2025. Maithili Sentence Aligned Speech Corpus(Tirhuta Script).Central Institute of Indian Languages, Mysore. 978-93-48633-51-4.
  • Ankita Tiwari, Satyaendra Kumar Awasthi, Shantanu Kumar, Narayan Kumar Choudhary, Shailendra Mohan. 2025. A Gold Standard Chhattisgarhi Raw Text Corpus Vol. II. Central Institute of Indian Languages, Mysore. ISBN:978-93-48633-16-3.
  • Shantanu Kumar, Dinesh Mishra, Rajesha N., Manasa G., Srikanth D., Stephen Fernandes, Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023. Maithili Sentence Aligned Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-96-2.
  • Satyaendra Kumar Awasthi, Ankita Tiwari, Shantanu Kumar, Rupesh Pandey, Saurabh Varik, Rajesha N., Manasa G., Srikanth D., Nithin S., Narayan Kumar Choudhary, Shailendra Mohan. 2023. Chhattisgarhi Raw Speech Corpus. Central Institute of Indian Languages, Mysore. ISBN: 978-81-19411-78-8.

Book Chapters

  • Shantanu Kumar & Narayan Choudhary. MAITHILI TEXT TO SPEECH CORPUS. In Rejitha K. S. & Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. pages. 98-88. Central Institute of Indian Languages, Mysore.
    978-93-48633-33-0.
  • Shantanu Kumar & Narayan Choudhary. "MAITHILI RAW SPEECH CORPUS VOL. II". In Rejitha K. S. & Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. Pages 37-41. Central Institute of Indian Languages, Mysore. 978-93-48633-33-0.
  • Shantanu Kumar & Narayan Choudhary. "A GOLD STANDARD MAITHILI RAW TEXT CORPUS VOL. II". In Rejitha K. S. & Narayan Kumar Choudhary. (ed.). 2025. LDC-IL Corpus Insights. pages. 31-33. Central Institute of Indian
    Languages, Mysore. 978-93-48633-33-0.
  • Shantanu Kumar, Dinesh Mishra, Narayan Choudhary, "Maithili Speech Annotation", In Rejitha KS & Narayan Choudhary(Eds.) Compendium of LDC-IL Sentence Aligned Speech Corpus. (2023), pages. 50-57., Central Institute
    of Indian Languages, Mysore. ISBN: 978-81-19411-34-4.
  • Dhananjay Acharya, Abha Jha, Amit Mishra, & Shantanu Kumar. 2024. Maithili Primer. (Book). Central Institute of Indian Languages, Mysore. ISBN: 978-81-971106-5-8.

Book Reviews

  • Shantanu Kumar. & Ankita Tiwari. 2024. Tamil Primer.(Book Hindi Review). Central Institute of Indian Languages, Mysore. ISBN: 978-81-971106-3-4.
  • Ankita Tiwari & Shantanu Kumar. 2024. Tibetan Primer.(Book Hindi Review). Central Institute of Indian Languages, Mysore. ISBN: 978-81-973948-3-6.
  • Ankita Tiwari & Shantanu Kumar. 2024. Ladakhi Primer.(Book Hindi Review). Central Institute of Indian Languages, Mysore. ISBN: 978-81-973948-4-3.
Mother tongue Maithili
Other Languages known Hindi, Sanskrit, Telugu, English, French