Skip to main content | Skip to Navigation | Text Size : | Language :

logo of Linguistic Data Consortium for Indian Languages (LDC-IL)
Released Datasets | Official Website of Linguistic Data Consortium for Indian Languages

Released Datasets of LDC-IL and their Prices

LDC-IL has so far released a total of 58+ datasets. The list of the datasets released is given below along with their prices for the commercial users.

Sl no. Name of datasets Link Prices
16 A Gold Standard Chhattisgarhi Raw Text Corpus 9879
17 A Gold Standard Assamese Raw Text Corpus 91308
18 Assamese Raw Speech Corpus 147762
19 Dogri Raw Speech Corpus 46671
20 Gujarati Raw Speech Corpus 155738
21 Gujarati Raw Speech Corpus (Mono Recordings) 175992
22 Indian English Raw Speech Corpus - Bengali Variant 70234
23 Indian English Raw Speech Corpus - Kannada Variant 64479
24 Kashmiri Raw Speech Corpus 76577
25 Mulitilingual Raw Speech Corpus 265710
26 Odia Raw Speech Corpus 375455
27 Tamil Raw Speech Corpus 378446
28 A Gold Standard Bengali Raw Text Corpus 36383
29 A Gold Standard Bodo Raw Text Corpus. 30476
30 A Gold Standard Dogri Raw Text Corpus. 5898

These datasets are distributed for both commercial and non-commercial usage.

Please note that for bonafide non-commercial and academic use, the datasets are free of charge. The requester needs to be a bonafide student/faculty/employee of a government funded research Institute or be a government entity.

Additional discounts are available for Startups, MSMEs, entitites from the SAARC countries. For more details about the discount and the procedure to procure the datasets, please login to the Data Distribution portal and see the FAQ page.