|
Text Corpora
|
Status of Text Corpora:
| Sl. No. |
Language |
Current status of entire corpora (No. of words) |
Current status of POS tagged corpora (No. of words) |
| 1. |
Assamese |
1714688 |
|
| 2. |
Bengali |
7208845 |
|
| 3. |
Bodo |
511345 |
|
| 4. |
Dogri |
802336 |
|
| 5. |
English |
2383616 |
|
| 6. |
Gujarati |
3350771 |
|
| 7. |
Hindi |
31116970 |
|
| 8. |
Kannada |
7552162 |
|
| 9. |
Kashmiri |
58767 |
|
| 10. |
Kodava |
182741 |
|
| 11. |
Konkani |
820973 |
|
| 12. |
Maithili |
484485 |
|
| 13. |
Malayalam |
2558604 |
|
| 14. |
Manipuri |
1344173 |
|
| 15. |
Marathi |
1300915 |
|
| 16. |
Nepali |
2246081 |
|
| 17. |
Oriya |
494370 |
|
| 18. |
Punjabi |
1398143 |
|
| 19. |
Sanskrit |
517642 |
|
| 20. |
Tamil |
3351954 |
|
| 21. |
Urdu |
8058216 |
|
| 22. |
Yarava |
13904 |
|
| 23. |
Telugu |
28865 |
|
Sample Files of Text Corpora:
| Sl. No. |
Language |
Sample Files:-> |
|
|
|
|
|
| |
|
PDF files: |
|
|
Doc files: |
|
|
| 1. |
Assamese |
1 |
2 |
3 |
1 |
2 |
3 |
| 2. |
Bengali |
1 |
2 |
3 |
1 |
2 |
3 |
| 3. |
Bodo |
1 |
2 |
3 |
1 |
2 |
3 |
| 4. |
Dogri |
1 |
2 |
3 |
1 |
2 |
3 |
| 5. |
Gujarati |
1 |
2 |
3 |
1 |
2 |
3 |
| 6. |
Hindi |
1 |
2 |
3 |
1 |
2 |
3 |
| 7. |
Kannada |
1 |
2 |
3 |
1 |
2 |
3 |
| 8. |
Kashmiri |
|
|
|
|
|
|
| 9. |
Konkani |
1 |
2 |
3 |
1 |
2 |
3 |
| 10. |
Maithili |
1 |
2 |
3 |
1 |
2 |
3 |
| 11. |
Malayalam |
1 |
2 |
3 |
1 |
2 |
3 |
| 12. |
Manipuri |
1 |
2 |
3 |
1 |
2 |
3 |
| 13. |
Marathi |
1 |
2 |
3 |
1 |
2 |
3 |
| 14. |
Nepali |
1 |
2 |
3 |
1 |
2 |
3 |
| 15. |
Oriya |
1 |
2 |
3 |
1 |
2 |
3 |
| 16. |
Punjabi |
1 |
2 |
3 |
1 |
2 |
3 |
| 17. |
Sanskrit |
1 |
2 |
3 |
1 |
2 |
3 |
| 18. |
Santhali |
1 |
2 |
3 |
1 |
2 |
3 |
| 19. |
Sindhi |
1 |
2 |
3 |
1 |
2 |
3 |
| 20. |
Tamil |
1 |
2 |
3 |
1 |
2 |
3 |
| 21. |
Telugu |
1 |
2 |
3 |
1 |
2 |
3 |
| 22. |
Urdu |
1 |
2 |
3 |
1 |
2 |
3 |
|