Skip to main content | Skip to Navigation | Text Size : | Language:

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Raw Text Corpus | Official Website of Linguistic Data Consortium for Indian Languages

Raw Text Corpus

Status of Text Corpora:( As on April 2025)

Slno Language Wordcount Sample
1 Assamese 1,01,27,030 Link
2 Bengali 42,37,440 Link
3 Bodo 29,15,544 Link
4 Chhattisgarhi vol.I 14,74,496 Link
Chhattisgarhi vol.II 22,19,592 Link
5 Dogri 8,01,771 Link
6 Gujarati 28,62,413 Link
7 Hindi 1,03,17,177 Link
8 Kannada 77,63,124 Link
9 Kashmiri vol.I 4,66,054 Link
Kashmiri vol.II 10,13,658 Link
10 Konkani 39,95,611 Link
11 Maithili vol.I 53,16,552 Link
Maithili vol.II 8,11,680 Link
12 Malayalam 63,70,954 Link
13 Manipuri 61,45,278 Link
14 Marathi 21,57,109 Link
15 Nepali 70,57,524 Link
16 Odia 15,88,287 Link
17 Punjabi 1,01,25,770 Link
18 Rajasthani 11,99,502 Link
19 Tamil 1,09,31,902 Link
20 Telugu vol.I 30,10,993 Link
Telugu vol.II 30,13,530 Link
21 Urdu 51,61,927 Link