| Size of Speech Corpora ( As on Aug 2011) |
SPEECH CORPORA (Raw Data) |
|
|
|
|
Sl No. |
Languages |
Speakers |
Hours |
1 |
Assamese |
456 |
105:51:38 |
2 |
Bengali |
472 |
138:18:47 |
3 |
Bodo |
433 |
201:10:48 |
4 |
Dogri |
154 |
111:32:11 |
5 |
Gujarati |
450 |
156:23:04 |
6 |
Hindi |
450 |
163:25:47 |
7 |
Indian English Bengali |
52 |
34:12:57 |
8 |
Indian English Guajarati (MP3 Format) |
52 |
21:40:00 |
9 |
Indian English Kannada |
54 |
37:01:33 |
10 |
Kannada |
492 |
143:28:54 |
11 |
Kashmiri |
150 |
44:59:07 |
12 |
Konkani |
455 |
195:14:47 |
13 |
Maithili |
156 |
43:33:42 |
14 |
Malayalam |
314 |
105:47:05 |
15 |
Manipuri |
457 |
107:10:27 |
16 |
Marathi |
306 |
168:13:50 |
17 |
Nepali |
485 |
145:04:46 |
18 |
Oriya |
462 |
165:30:05 |
19 |
Punjabi |
468 |
110:48:26 |
20 |
Tamil |
453 |
213:37:27 |
21 |
Telugu |
156 |
50:51:36 |
22 |
Urdu |
480 |
124:19:58 |
SPEECH CORPORA (Segmented Data) |
LANGUAGE |
DIALECTS |
NO. OF FEMALE SPEAKERS |
NO. OF MALE SPEAKERS |
TOTAL NO. OF SPEAKERS |
SIZE OF SPEECH DATA-FEMALE (HOURS) |
SIZE OF SPEECH DATA-MALE (HOURS) |
TOTAL SPEECH DATA (HOURS) |
Assamese |
Upper Assam, Lower Assam |
154 |
152 |
306 |
08:23:43 |
17:35:43 |
25:59:26 |
Bengali |
SCB (Kolkata) & Barendri (North Bengal) |
231 |
238 |
469 |
27:28:16 |
29:21:12 |
56:49:28 |
Bodo |
Standard and Non Standard |
71 |
75 |
146 |
02:28:10 |
05:18:46 |
07:46:56 |
English Bengali |
Standard And South Gujarati |
27 |
26 |
53 |
|
|
|
English Kannada |
AVADHI, BHOJPURI, MAGAHI and STANDARD |
26 |
26 |
52 |
|
|
|
Gujarati |
Indian |
27 |
38 |
65 |
02:07:27 |
03:53:59 |
06:01:26 |
Gujarati Mono |
Indian |
125 |
110 |
235 |
00:25:33 |
05:13:53 |
05:39:26 |
Hindi |
Standard, Bhojpuri & Magahi |
207 |
226 |
433 |
17:30:17 |
20:22:55 |
37:53:12 |
Kannada |
North-East, North-west and Canara |
246 |
246 |
492 |
04:03:12 |
05:11:27 |
57:14:39 |
Konkani |
Standard |
57 |
61 |
118 |
06:48:41 |
07:50:48 |
14:39:29 |
Maithili |
Standard |
72 |
74 |
146 |
00:10:25 |
01:55:59 |
02:06:24 |
Malayalam |
Standard |
151 |
150 |
301 |
11:07:49 |
22:16:00 |
33:23:49 |
Manipuri |
Standard and Kakching |
229 |
221 |
450 |
03:28:31 |
07:21:29 |
10:50:00 |
Marathi |
Standard |
75 |
75 |
150 |
07:12:09 |
07:09:29 |
14:21:38 |
Nepali |
Darjeeling and Assamese |
99 |
97 |
196 |
09:40:49 |
09:17:42 |
18:58:31 |
Oriya |
Standard |
169 |
171 |
340 |
09:11:12 |
11:24:43 |
20:35:55 |
Punjabi |
Standard |
78 |
78 |
156 |
05:00:24 |
06:10:44 |
11:11:08 |
Tamil |
Standard |
64 |
86 |
150 |
15:08:57 |
17:27:55 |
32:36:52 |
Telugu |
Standard |
13 |
43 |
56 |
00:13:28 |
00:53:13 |
01:06:41 |
Urdu |
Standard |
169 |
168 |
337 |
21:34:54 |
29:57:05 |
51:31:59 |
SPEECH CORPORA (Annotated Data) |
Sl. No. |
Name of the Language |
Validated Speech Annotated Data |
1. |
Bengali |
04:33:37 |
2. |
Hindi |
01:01:28 |
3. |
Konkani |
02:25:00 |
4. |
Kannada |
01:00:00 |
5. |
Oriya |
00:58:28 |
6. |
Malayalam |
01:00:00 |
7. |
Punjabi |
04:07:26 |
8. |
Tamil |
01:00:00 |
|