Blind Extraction of Dominant Target Sources Using ICA and Time-Frequency Masking

Authors:
H. Sawada;S. Araki;R. Mukai;S. Makino
Affiliations:
NTT Commun. Sci. Lab., NTT Corp., Kyoto;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 12

3D-audio matting, postediting, and rerendering from field recordings

EURASIP Journal on Applied Signal Processing
A Speech Enhancement Method in Subband

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks, Part III
Underdetermined convolutive blind source separation via time-frequency masking

IEEE Transactions on Audio, Speech, and Language Processing
Glimpsing IVA: a framework for overcomplete/complete/undercomplete convolutive source separation

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Independent component analysis and time-frequency masking for speech recognition in multitalker conditions

EURASIP Journal on Audio, Speech, and Music Processing
A multistage approach to blind separation of convolutive speech mixtures

Speech Communication
Underdetermined DOA estimation via independent component analysis and time-frequency masking

Journal of Electrical and Computer Engineering
Source localization for multiple speech sources using low complexity non-parametric source separation and clustering

Signal Processing
Convolutive underdetermined source separation through weighted interleaved ICA and spatio-temporal source correlation

LVA/ICA'12 Proceedings of the 10th international conference on Latent Variable Analysis and Signal Separation
Acoustic Rendering and Auditory–Visual Cross-Modal Perception and Interaction

Computer Graphics Forum
Approaches and applications of semi-blind signal extraction for communication signals based on constrained independent component analysis: The complex case

Neurocomputing
Linear Estimation Based Primary-Ambient Extraction for Stereo Audio Signals

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a method for enhancing target sources of interest and suppressing other interference sources. The target sources are assumed to be close to sensors, to have dominant powers at these sensors, and to have non-Gaussianity. The enhancement is performed blindly, i.e., without knowing the position and active time of each source. We consider a general case where the total number of sources is larger than the number of sensors, and neither the number of target sources nor the total number of sources is known. The method is based on a two-stage process where independent component analysis (ICA) is first employed in each frequency bin and then time-frequency masking is used to improve the performance further. We propose a new sophisticated method for deciding the number of target sources and then selecting their frequency components. We also propose a new criterion for specifying time-frequency masks. Experimental results for simulated cocktail party situations in a room, whose reverberation time was 130 ms, are presented to show the effectiveness and characteristics of the proposed method