The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments

Authors:
Alex Stupakov;Evan Hanusa;Deepak Vijaywargi;Dieter Fox;Jeff Bilmes
Affiliations:
Department of Electrical Engineering, University of Washington, Seattle, WA, USA;Department of Electrical Engineering, University of Washington, Seattle, WA, USA;Department of Electrical Engineering, University of Washington, Seattle, WA, USA;Department of Computer Science & Engineering, University of Washington, Seattle, WA, USA;Department of Electrical Engineering, University of Washington, Seattle, WA, USA
Venue:
Computer Speech and Language
Year:
2012

Citing 7
Cited 4

Speech recognition in noisy environments: a survey

Speech Communication
Multi-microphone correlation-based processing for robust automatic speech recognition

Multi-microphone correlation-based processing for robust automatic speech recognition
Microphone array processing for robust speech recognition

Microphone array processing for robust speech recognition
Noise robustness in automatic speech recognition

Noise robustness in automatic speech recognition
The Lombard effect: a reflex to better communicate with others in noise

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
Subjective comparison and evaluation of speech enhancement algorithms

Speech Communication
SWITCHBOARD: telephone speech corpus for research and development

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1

Enhancing spontaneous speech recognition with BLSTM features

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
A simple metric for turn-taking in emergent communication

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework

Image and Vision Computing
Probabilistic speech feature extraction with context-sensitive Bottleneck neural networks

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an overview of the data collection and transcription efforts for the COnversational Speech In Noisy Environments (COSINE) corpus. The corpus is a set of multi-party conversations recorded in real world environments with background noise. It can be used to train noise-robust speech recognition systems or develop speech de-noising algorithms. We explain the motivation for creating such a corpus, and describe the resulting audio recordings and transcriptions that comprise the corpus. These high quality recordings were captured in situ on a custom wearable recording system, whose design and construction is also described. On separate synchronized audio channels, seven-channel audio is captured with a 4-channel far-field microphone array, along with a close-talking, a monophonic far-field, and a throat microphone. This corpus thus creates many possibilities for speech algorithm research.