Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Raw Text Corpus | Official Website of Linguistic Data Consortium for Indian Languages

Raw Text Corpus

Status of Text Corpora:( As on Jul 2014)

Slno Language Wordcount Sample
1 ASSAMESE 10127030 Link
2 BENGALI 4237440 Link
3 BODO 2915544 Link
4 CHHATTISGARHI 1474496 Link
5 DOGRI 801771 Link
6 GUJARATI 28, 62,413 Link
7 HINDI 10317177 Link
8 KANNADA 7763124 Link
9 KASHMIRI 466054 Link
10 KONKANI 3995611 Link
11 MAITHILI 5316552 Link
12 MALAYALAM 63, 70,954 Link
13 MANIPURI 6145278 Link
14 MARATHI 21,57,109  Link
15 NEPALI 7057524 Link
16 ODIA 15, 88, 287 Link
17 PUNJABI 10125770 Link
18 TAMIL 10931902 Link
19 TELUGU 3010993 Link
20 URDU 5161927 Link