Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Released Datasets | Official Website of Linguistic Data Consortium for Indian Languages

Released Datasets of LDC-IL and their Prices

LDC-IL has so far released a total of 58+ datasets. The list of the datasets released is given below along with their prices for the commercial users.

Sl no. Name of datasets Link Prices
31 A Gold Standard Gujarati Raw Text Corpus 24514
32 A Gold Standard Hindi Raw Text Corpus. 51590
33 A Gold Standard Kannada Raw Text Corpus. 70280
34 A Gold Standard Kashmiri Raw Text Corpus. 3780
35 A Gold Standard Konkani Raw Text Corpus. 37882
36 A Gold Standard Maithili Raw Text Corpus. 42347
37 A Gold Standard Malayalam Raw Text Corpus. 70491
38 A Gold Standard Manipuri Raw Text Corpus. 61578
39 A Gold Standard Marathi Raw Text Corpus. 18805
40 A Gold Standard Nepali Raw Text Corpus. 65321
41 A Gold Standard Odia Raw Text Corpus. 14712
42 A Gold Standard Tamil Raw Text Corpus. 113829
43 A Gold Standard Telugu Raw Text Corpus. 35350
44 A Gold Standard Urdu Raw Text Corpus. 29853
45 Bengali Raw Speech Corpus 350126

These datasets are distributed for both commercial and non-commercial usage.

Please note that for bonafide non-commercial and academic use, the datasets are free of charge. The requester needs to be a bonafide student/faculty/employee of a government funded research Institute or be a government entity.

Additional discounts are available for Startups, MSMEs, entitites from the SAARC countries. For more details about the discount and the procedure to procure the datasets, please login to the Data Distribution portal and see the FAQ page.