Skip to main content | Skip to Navigation | Text Size : | Language : Other languages :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
भारतीय भाषाओं के लिए भाषाई डेटा कंसोर्टियम (एलडीसी-आईएल)
Linguistic Data Consortium for Indian Languages (LDC-IL)

शिक्षा मंत्रालय, भारत सरकार
Ministry of Education, Government of India

Comparable Text Corpus | LDC-IL

Comparable Text Corpus

Status of Comparable Text Corpora :

Slno Language No of words
1. English - Bengali 126828 - 93952
2. English - Dogri 88025 - 93293
3. English - Hindi 1814273 - 1802435
4. English - Kannada 779258 - 476855
5. English - Maithili 159419 - 136421
6. English - Nepali 263256 – 202157

Sample files