Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Authors:
Debadatta Pati;S. R. Prasanna
Affiliations:
Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India 781039;Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India 781039
Venue:
International Journal of Speech Technology
Year:
2011

Citing 8
Cited 1

Time-frequency analysis: theory and applications

Time-frequency analysis: theory and applications
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Usefulness of the LPC-residue in text-independent speaker verification

Speech Communication
Speaker Identification Using Harmonic Structure of LP-residual Spectrum

AVBPA '97 Proceedings of the First International Conference on Audio- and Video-Based Biometric Person Authentication
Rapid and brief communication: Combining classifier decisions for robust speaker identification

Pattern Recognition
Extraction and representation of prosodic features for language and speaker recognition

Speech Communication
Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing

Speaker verification using excitation source information

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work processes linear prediction (LP) residual in the time domain at three different levels, extracts speaker information, and demonstrates their significance and also different nature for text-independent speaker recognition. The subsegmental analysis considers LP residual in blocks of 5 msec with shift of 2.5 msec to extract speaker information. The segmental analysis extracts speaker information by processing in blocks of 20 msec with shift of 2.5 msec. The suprasegmental speaker information is extracted by viewing in blocks of 250 msec with shift of 6.25 msec. The speaker identification and verification studies performed using NIST-99 and NIST-03 databases demonstrate that the segmental analysis provides best performance followed by subsegmental analysis. The suprasegmental analysis gives the least performance. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance, demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance.