Interfacing sound stream segregation to automatic speech recognition: preliminary results on listening to several sounds simultaneously

Authors:
Hiroshi G. Okuno;Tomohiro Nakatani;Takeshi Kawabata
Affiliations:
NIT Basic Research Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Kanagawa, Japan;NIT Basic Research Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Kanagawa, Japan;NIT Basic Research Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Kanagawa, Japan
Venue:
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Year:
1996

Citing 4
Cited 4

Auditory stream segregation in auditory scene analysis with a multi-agent system

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Computational auditory scene analysis

Computational auditory scene analysis
Localization by harmonic structure and its application to harmonic sound stream segregation

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Residue-driven architecture for computational auditory scene analysis

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

Sound ontology for computational auditory scence analysis

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Using vision to improve sound source separation

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Understanding three simultaneous speeches

IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Recognition of simultaneous speech by estimating reliability of separated signals for robot audition

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting some sounds for non-harmonic parts of groups. This system is implemented by extending the harmonic-based stream segregation system reported at AAAI-94 and IJCAI-95. The main problem in interfacing SSS with HMM-based ASR is how to improve the recognition performance which is degraded by spectral distortion of segregated sounds caused mainly by the binaural input, grouping, and residue substitution. Our solution is to re-train the parameters of the HMM with training data binauralized for four directions, to group harmonic fragments according to their directions, and to substitute the residue of harmonic fragments for non-harmonic parts of each group. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.