| Size of Speech Corpora ( As on Dec 2011) |
SPEECH CORPORA (Raw Data) |
|
|
|
Sl No. |
Languages |
Hours |
1 |
Assamese |
105:52:37 |
2 |
Bengali |
138:18:47 |
3 |
Bodo |
114:38:55 |
4 |
Dogri |
58:12:49 |
5 |
Gujarati |
146:23:04 |
6 |
Hindi |
163:25:47 |
7 |
Indian English Bengali |
34:12:57 |
8 |
Indian English Guajarati (MP3 Format) |
21:40:00 |
9 |
Indian English Kannada |
37:01:33 |
10 |
Kannada |
137:53:28 |
11 |
Kashmiri |
44:59:07 |
12 |
Konkani |
205:01:48 |
13 |
Maithili |
43:33:42 |
14 |
Malayalam |
105:47:05 |
15 |
Manipuri |
107:10:30 |
16 |
Marathi |
168:13:50 |
17 |
Nepali |
145:04:46 |
18 |
Oriya |
45:10:25 |
19 |
Punjabi |
71:55:56 |
20 |
Tamil |
87:03:24 |
21 |
Telugu |
50:51:36 |
22 |
Urdu |
81:06:25 |
SPEECH CORPORA (Segmented Data) |
LANGUAGE |
DIALECTS |
NO. OF FEMALE SPEAKERS |
NO. OF MALE SPEAKERS |
TOTAL NO. OF SPEAKERS |
TOTAL SPEECH DATA (HOURS) |
Assamese |
Upper Assam, Lower Assam |
154 |
152 |
306 |
80:08:04 |
Bengali |
SCB (Kolkata) & Barendri (North Bengal) |
231 |
238 |
469 |
125:19:53 |
Bodo |
Standard and Non Standard |
71 |
75 |
146 |
07:46:56 |
Indian English Bengali |
Indian |
27 |
26 |
53 |
26:56:45 |
Indian English Kannada |
Indian |
27 |
26 |
53 |
16:52:24 |
Gujarati |
Standard And South Gujarati |
27 |
38 |
65 |
06:01:26 |
Hindi |
Standard, Bhojpuri & Magahi |
206 |
227 |
433 |
105:26:45 |
Kannada |
North-East, North-west and Canara |
246 |
246 |
492 |
137:10:37 |
Konkani |
Standard |
54 |
53 |
107 |
43:01:36 |
Maithili |
Standard |
72 |
74 |
146 |
02:06:24 |
Malayalam |
Standard |
81 |
80 |
161 |
63:56:45 |
Manipuri |
Standard and Kakching |
115 |
112 |
227 |
36:33:28 |
Marathi |
Standard |
75 |
75 |
150 |
58:57:50 |
Nepali |
Darjeeling and Assamese |
99 |
97 |
196 |
44:48:43 |
Oriya |
Standard |
80 |
82 |
162 |
37:38:48 |
Punjabi |
Standard |
78 |
78 |
156 |
29:38:25 |
Tamil |
Standard |
64 |
86 |
150 |
74:11:58 |
Telugu |
Standard |
13 |
43 |
56 |
01:06:41 |
Urdu |
Standard |
85 |
84 |
169 |
40:01:04 |
SPEECH CORPORA (Annotated Data) |
Sl. No. |
Name of the Language |
Validated Speech Annotated Data (HH:MM:SS) |
1. |
Bengali |
04:33:37 |
2. |
Hindi |
01:01:28 |
3. |
Konkani |
02:25:00 |
4. |
Kannada |
01:00:00 |
5. |
Oriya |
00:58:28 |
6. |
Malayalam |
01:00:00 |
7. |
Punjabi |
04:07:26 |
8. |
Tamil |
01:00:00 |
|