Auditory stream segregation in auditory scene analysis with a multi-agent system
AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Computational auditory scene analysis
Computational auditory scene analysis
Localization by harmonic structure and its application to harmonic sound stream segregation
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
Residue-driven architecture for computational auditory scene analysis
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Sound ontology for computational auditory scence analysis
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Using vision to improve sound source separation
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Understanding three simultaneous speeches
IJCAI'97 Proceedings of the 15th international joint conference on Artifical intelligence - Volume 1
Recognition of simultaneous speech by estimating reliability of separated signals for robot audition
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Hi-index | 0.00 |
This paper reports the preliminary results of experiments on listening to several sounds at once. Two issues are addressed: segregating speech streams from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition (ASR). Speech stream segregation (SSS) is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting some sounds for non-harmonic parts of groups. This system is implemented by extending the harmonic-based stream segregation system reported at AAAI-94 and IJCAI-95. The main problem in interfacing SSS with HMM-based ASR is how to improve the recognition performance which is degraded by spectral distortion of segregated sounds caused mainly by the binaural input, grouping, and residue substitution. Our solution is to re-train the parameters of the HMM with training data binauralized for four directions, to group harmonic fragments according to their directions, and to substitute the residue of harmonic fragments for non-harmonic parts of each group. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.