Central Institute of Indian Languages [CIIL] MISSION STATEMENT:  Annotated, quality language data (both-text & speech) and tools in Indian Languages to Individuals, Institutions and Industry for Research & Development - Created in-house, through outsourcing and acquisition..  Our Other Sites  Related Sites 
You are here: BACK
Resources > Speech Corpora
Size of Speech Corpora ( As on Jul 2014)


SPEECH CORPORA (Segmented Data)

Sl. No.

Language

Dialog

Hour

Minutes

Seconds

Speakers

1

Assamese

Upper Assam, Lower Assam

79

39

41

306

2

Bengali

SCB (Kolkata) & Barendri (North Bengal)

102

04

12

469

3

Bodo

Standard and Non Standard

198

10

48

416

4

Dogri

Standard

17

10

58

61

7

Gujarati

Indian

122

01

05

441

9

Hindi

Standard, Bhojpuri & Magahi

105

23

54

434

10

Kannada

North-East(Hyderabad Karnataka), North-West(Mumbai Karnataka) and Canara

191

15

47

656

11

Kashmiri

Standard

29

26

13

149

12

Konkani

Standard

136

06

19

441

13

Maithili

Standard

125

54

31

300

14

Malayalam

Standard

179

51

13

458

15

Manipuri

Standard and Kakching

174

24

57

621

16

Marathi

Standard

88

48

38

305

17

Nepali

Darjeeling and Assamese

88

55

04

351

18

Odia

Standard

146

35

28

474

19

Punjabi

Standard

104

14

18

468

20

Tamil

Standard

142

46

56

446

21

Telugu

Standard

12

47

56

67

22

Urdu

Standard

94

46

12

500


Back Top

SPEECH CORPORA (Annotated Data)

Sl. No.

Name of the Language

Annotated
(HH:MM:SS)

1

Assamese

28:18:56

2

Bengali

36:24:39

3

Bodo

30:45:56

4

Gujarati

02:39:39

5

Hindi

80:01:48

6

Kannada

62:13:07

7

Kashmiri

06:28:25

8

Konkani

37:00:00

9

Maithili

30:10:40

10

Malayalam

92:40:43

11

Manipuri

109:48:27

12

Nepali

12:23:51

13

Odia

62:33:15

14

Punjabi

47:07:13

15

Tamil

58:06:16

16

Urdu

23:48:27


Back Top

Pronunciation Dictionaries (Studio Recording)

Sl. No

Language

Hour

Minutes

Seconds

1

Assamese

36

31

28

2

Bengali

21

55

46

3

Bodo

50

38

55

4

Gujarati

49

0

0

5

Hindi

45

51

32

6

Kannada

58

20

43

7

Konkani

32

29

53

8

Malayalam

33

3

5

9

Manipuri

49

41

18

10

Nepali

23

23

35

11

Odia

40

0

33

12

Punjabi

33

30

6

13

Tamil

48

12

10

14

Urdu

34

2

39

TOP BACK
Visitor Counter

500017

Developed & Maintained by:
LDC-IL, CIIL
Copyright © LDC-IL,
Central Institute of Indian Languages
Central Institute of Indian Languages
Department of Higher Education
Ministry of Education
Government of India
Manasagangothri, Hunsur Road, Mysore-570006, Karnataka, India.
Tel: (0821) 2515820 (Director)
Reception/PABX : (0821) 2345000
Fax: (0821) 2515032 (Off)
        Home | Announcements | News | CIIL | Contact Us