|
Text Corpora
|
Status of Text Corpora:( As on Feb 2012)
Sl. No. |
Language |
Current status of entire corpora (No. of words) |
1. |
Assamese |
6751508 |
2. |
Bengali |
7208845 |
3. |
Bodo |
1211178 |
4. |
Dogri |
230082 |
5. |
English |
2383616 |
6. |
Gujarati |
4355969 |
7. |
Hindi |
31116970 |
8. |
Kannada |
7552162 |
9. |
Kashmiri |
990262 |
10. |
Kodava |
182741 |
11. |
Konkani |
2817518 |
12. |
Maithili |
2681832 |
13. |
Malayalam |
5218043 |
14. |
Manipuri |
3510123 |
15. |
Marathi |
2312239 |
16. |
Nepali |
6246425 |
17. |
Oriya |
1047204 |
18. |
Punjabi |
3953361 |
19. |
Sanskrit |
517642 |
20. |
Tamil |
9294292 |
21. |
Urdu |
5184915 |
22. |
Yarava |
13904 |
23. |
Telugu |
1911685 |
Sample Files of Text Corpora:
|