Identification of Indian languages using multi-level spectral and prosodic features

Authors:
V. Ramu Reddy;Sudhamay Maity;K. Sreenivasa Rao
Affiliations:
TCS Innovation Labs, Kolkata, India 700091;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302;School of Information Technology, Indian Institute of Technology Kharagpur, Kharagpur, India 721302
Venue:
International Journal of Speech Technology
Year:
2013

Citing 22
Cited 0

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
An introduction to text-to-speech synthesis

An introduction to text-to-speech synthesis
Prosody-based automatic segmentation of speech into sentences and topics

Speech Communication - Special issue on accessing information in spoken audio
Comparison of different implementations of MFCC

Journal of Computer Science and Technology
Comparing Prosody Across Many Languages

Comparing Prosody Across Many Languages
Modeling durations of syllables using neural networks

Computer Speech and Language
An unsupervised approach to language identification

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Springer Handbook of Speech Processing

Springer Handbook of Speech Processing
Extraction and representation of prosodic features for language and speaker recognition

Speech Communication
Intonation modeling for Indian languages

Computer Speech and Language
Short Communication: Duration modification using glottal closure instants and vowel onset points

Speech Communication
Voice conversion by mapping the speaker-specific features using pitch synchronous approach

Computer Speech and Language
Automatic language identification using Gaussian mixture and hidden Markov models

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Application of prosody models for developing speech systems in Indian languages

International Journal of Speech Technology
A hierarchical language identification system for Indian languages

Digital Signal Processing
Discriminatively Trained GMMs for Language Classification Using Boosting Methods

IEEE Transactions on Audio, Speech, and Language Processing
Prosody modification using instants of significant excitation

IEEE Transactions on Audio, Speech, and Language Processing
Epoch Extraction From Speech Signals

IEEE Transactions on Audio, Speech, and Language Processing
Automatic Prosodic Variations Modeling for Language and Dialect Discrimination

IEEE Transactions on Audio, Speech, and Language Processing
Vowel onset point detection for noisy speech using spectral energy at formant frequencies

International Journal of Speech Technology
Non-uniform time scale modification using instants of significant excitation and vowel onset points

Speech Communication
Pitch synchronous and glottal closure based speech analysis for language recognition

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper spectral and prosodic features extracted from different levels are explored for analyzing the language specific information present in speech. In this work, spectral features extracted from frames of 20 ms (block processing), individual pitch cycles (pitch synchronous analysis) and glottal closure regions are used for discriminating the languages. Prosodic features extracted from syllable, tri-syllable and multi-word (phrase) levels are proposed in addition to spectral features for capturing the language specific information. In this study, language specific prosody is represented by intonation, rhythm and stress features at syllable and tri-syllable (words) levels, whereas temporal variations in fundamental frequency (F 0 contour), durations of syllables and temporal variations in intensities (energy contour) are used to represent the prosody at multi-word (phrase) level. For analyzing the language specific information in the proposed features, Indian language speech database (IITKGP-MLILSC) is used. Gaussian mixture models are used to capture the language specific information from the proposed features. The evaluation results indicate that language identification performance is improved with combination of features. Performance of proposed features is also analyzed on standard Oregon Graduate Institute Multi-Language Telephone-based Speech (OGI-MLTS) database.