Construction and Evaluation of a Large In-Car Speech Corpus

Authors:
Kazuya Takeda;Hiroshi Fujimura;Katsunobu Itou;Nobuo Kawaguchi;Shigeki Matsubara;Fumitada Itakura
Affiliations:
The authors are with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan. E-mail: takeda@is.nagoya-u.ac.jp,;The authors are with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan. E-mail: takeda@is.nagoya-u.ac.jp,;The authors are with the Graduate School of Information Science, Nagoya University, Nagoya-shi, 464-8603 Japan. E-mail: takeda@is.nagoya-u.ac.jp,;The authors are with the Information Technology Center, Nagoya University, Nagoya-shi, 464-8601 Japan.,;The authors are with the Information Technology Center, Nagoya University, Nagoya-shi, 464-8601 Japan.,;The author is with Meijo University, Nagoya-shi, 468-8502 Japan.
Venue:
IEICE - Transactions on Information and Systems
Year:
2005

Citing 0
Cited 3

Multichannel Speech Enhancement Based on Generalized Gamma Prior Distribution with Its Online Adaptive Estimation

IEICE - Transactions on Information and Systems
Jump function Kolmogorov for audio classification in noise-mismatch conditions

IEEE Transactions on Signal Processing
Robust mandarin speech recognition for car navigation interface

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we discuss the construction of a large in-car spoken dialogue corpus and the result of its analysis. We have developed a system specially built into a Data Collection Vehicle (DCV) which supports the synchronous recording of multichannel audio data from 16 microphones that can be placed in flexible positions, multichannel video data from 3 cameras, and vehicle related data. Multimedia data has been collected for three sessions of spoken dialogue with different modes of navigation, during approximately a 60 minute drive by each of 800 subjects. We have characterized the collected dialogues across the three sessions. Some characteristics such as sentence complexity and SNR are found to differ significantly among the sessions. Linear regression analysis results also clarify the relative importance of various corpus characteristics.