|
Text Corpora
|
Status of Text Corpora:( As on Aug 2011)
Sl. No. |
Language |
Current status of entire corpora (No. of words) |
1. |
Assamese |
5955512 |
2. |
Bengali |
7208845 |
3. |
Bodo |
512009 |
4. |
Dogri |
230082 |
5. |
English |
2383616 |
6. |
Gujarati |
4355969 |
7. |
Hindi |
31116970 |
8. |
Kannada |
7552162 |
9. |
Kashmiri |
557335 |
10. |
Kodava |
182741 |
11. |
Konkani |
2199705 |
12. |
Maithili |
2092871 |
13. |
Malayalam |
4415391 |
14. |
Manipuri |
2306371 |
15. |
Marathi |
2312239 |
16. |
Nepali |
6246425 |
17. |
Oriya |
718219 |
18. |
Punjabi |
2871826 |
19. |
Sanskrit |
517642 |
20. |
Tamil |
7179218 |
21. |
Urdu |
5184915 |
22. |
Yarava |
13904 |
23. |
Telugu |
1021032 |
Sample Files of Text Corpora:
| Sl. No. |
Language |
Sample Files:-> |
|
|
|
|
|
| |
|
PDF files: |
|
|
Doc files: |
|
|
| 1. |
Assamese |
1 |
2 |
3 |
1 |
2 |
3 |
| 2. |
Bengali |
1 |
2 |
3 |
1 |
2 |
3 |
| 3. |
Bodo |
1 |
2 |
3 |
1 |
2 |
3 |
| 4. |
Dogri |
1 |
2 |
3 |
1 |
2 |
3 |
| 5. |
Gujarati |
1 |
2 |
3 |
1 |
2 |
3 |
| 6. |
Hindi |
1 |
2 |
3 |
1 |
2 |
3 |
| 7. |
Kannada |
1 |
2 |
3 |
1 |
2 |
3 |
| 8. |
Kashmiri |
|
|
|
|
|
|
| 9. |
Konkani |
1 |
2 |
3 |
1 |
2 |
3 |
| 10. |
Maithili |
1 |
2 |
3 |
1 |
2 |
3 |
| 11. |
Malayalam |
1 |
2 |
3 |
1 |
2 |
3 |
| 12. |
Manipuri |
1 |
2 |
3 |
1 |
2 |
3 |
| 13. |
Marathi |
1 |
2 |
3 |
1 |
2 |
3 |
| 14. |
Nepali |
1 |
2 |
3 |
1 |
2 |
3 |
| 15. |
Oriya |
1 |
2 |
3 |
1 |
2 |
3 |
| 16. |
Punjabi |
1 |
2 |
3 |
1 |
2 |
3 |
| 17. |
Sanskrit |
1 |
2 |
3 |
1 |
2 |
3 |
| 18. |
Santhali |
1 |
2 |
3 |
1 |
2 |
3 |
| 19. |
Sindhi |
1 |
2 |
3 |
1 |
2 |
3 |
| 20. |
Tamil |
1 |
2 |
3 |
1 |
2 |
3 |
| 21. |
Telugu |
1 |
2 |
3 |
1 |
2 |
3 |
| 22. |
Urdu |
1 |
2 |
3 |
1 |
2 |
3 |
|