Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram

Authors:
Pawan K. Ajmera;Dattatray V. Jadhav;Raghunath S. Holambe
Affiliations:
S.G.G.S. Institute of Engineering and Technology, Vishnupuri, Nanded, India;Bhivarabai Sawant College of Engineering and Research, Pune, India;S.G.G.S. Institute of Engineering and Technology, Vishnupuri, Nanded, India
Venue:
Pattern Recognition
Year:
2011

Citing 12
Cited 1

Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification

Speech Communication
Sub-band SNR estimation using auditory feature processing

Speech Communication - Special issue on speech processing for hearing aids
PCA and LDA in DCT domain

Pattern Recognition Letters
On the use of orthogonal GMM in speaker recognition

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Scaling and rotation invariant analysis approach to object recognition based on Radon and Fourier-Mellin transforms

Pattern Recognition
Explicit modelling of session variability for speaker verification

Computer Speech and Language
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
An Improved Score Level Fusion in Multimodal Biometric Systems

PDCAT '09 Proceedings of the 2009 International Conference on Parallel and Distributed Computing, Applications and Technologies
Performance evaluation of score level fusion in multimodal biometric systems

Pattern Recognition
Speaker and Session Variability in GMM-Based Speaker Verification

IEEE Transactions on Audio, Speech, and Language Processing
Rotation-invariant multiresolution texture analysis using Radon and wavelet transforms

IEEE Transactions on Image Processing

Optimization of the parameters characterizing sigmoidal rate-level functions based on acoustic features

Speech Communication

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper presents a new feature extraction technique for speaker recognition using Radon transform (RT) and discrete cosine transform (DCT). The spectrogram is compact, efficient in representation and carries information about acoustic features in the form of pattern. In the proposed method, speaker specific features have been extracted by applying image processing techniques to the pattern available in the spectrogram. Radon transform has been used to derive the effective acoustic features from the speech spectrogram. Radon transform adds up the pixel values in the given image along a straight line in a particular direction and at a specific displacement. The proposed technique computes Radon projections for seven orientations and captures the acoustic characteristics of the spectrogram. DCT applied on Radon projections yields low dimensional feature vector. The technique is computationally efficient, text-independent, robust to session variations and insensitive to additive noise. The performance of the proposed algorithm has been evaluated using the Texas Instruments and Massachusetts Institute of Technology (TIMIT) and our own created Shri Guru Gobind Singhji (SGGS) databases. The recognition rate of the proposed algorithm on TIMIT database (consisting of 630 speakers) is 96.69% and for SGGS database (consisting of 151 speakers) is 98.41%. These results highlight the superiority of the proposed method over some of the existing algorithms.