DISTBIC: a speaker-based segmentation for audio data indexing
Speech Communication - Special issue on accessing information in spoken audio
Adaptive speaker identification with audiovisual cues for movie content analysis
Pattern Recognition Letters - Video computing
Segregation of speakers for speech recognition and speaker identification
ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Efficient Speaker Change Detection Using Adapted Gaussian Mixture Models
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
The classical speaker identification algorithm gives acceptable results if the training is done offline using good quality database [5]. Though there has been a substantial amount of research in speaker recognition area, the majority of work has been focused on the offline training scenario. However in some scenarios where real time speaker recognition is required like in the case of Viewer preference based presentation/playback of media content, offline training is not possible as there is no prior information on the subjects/speakers present in the content. A run time training approach is required to generate a dynamic features database, which can be used to provide features like Viewer preference based seek or Zoom to specific subject/speaker during Media Playback. In this paper we propose a speaker recognition system using a dynamically created database. In this paper we consider Speaker recognition as a classification problem wherein speakers are classified based on speech features. The proposed speaker recognition system uses MFCC (Mel Frequency Cepstral Coefficients) as features and Polynomial/GMM (Gaussian Mixture Model) as classifiers. In our analysis, we demonstrate the pros and cons of the algorithms employing dynamic database creation. The test results show that ~96% accuracy for a content having 5 speakers can be achieved using the proposed system.