Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

  • Authors:
  • Debadatta Pati;S. R. Prasanna

  • Affiliations:
  • Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India 781039;Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, India 781039

  • Venue:
  • International Journal of Speech Technology
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

This work processes linear prediction (LP) residual in the time domain at three different levels, extracts speaker information, and demonstrates their significance and also different nature for text-independent speaker recognition. The subsegmental analysis considers LP residual in blocks of 5 msec with shift of 2.5 msec to extract speaker information. The segmental analysis extracts speaker information by processing in blocks of 20 msec with shift of 2.5 msec. The suprasegmental speaker information is extracted by viewing in blocks of 250 msec with shift of 6.25 msec. The speaker identification and verification studies performed using NIST-99 and NIST-03 databases demonstrate that the segmental analysis provides best performance followed by subsegmental analysis. The suprasegmental analysis gives the least performance. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance, demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance.