Robust detection of phone boundaries using model selection criteria with few observations

Authors:
George Almpanidis;Margarita Kotti;Constantine Kotropoulos
Affiliations:
Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece;Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 9
Cited 1

Automatic segmentation of speech recorded in unknown noisy channel characteristics

Speech Communication - Special issue on robust speech recognition
A fast algorithm for the minimum covariance determinant estimator

Technometrics
Akaike's information criterion and recent developments in information complexity

Journal of Mathematical Psychology
DISTBIC: a speaker-based segmentation for audio data indexing

Speech Communication - Special issue on accessing information in spoken audio
Automatic transcription of Broadcast News

Speech Communication - Special issue on automatic transcription of broadcast news data
A Comparison of Different Approaches to Automatic Speech Segmentation

TSD '02 Proceedings of the 5th International Conference on Text, Speech and Dialogue
Context-Dependent Boundary Model for Refining Boundaries Segmentation of TTS Units

IEICE - Transactions on Information and Systems
Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion

Speech Communication
Computationally Efficient and Robust BIC-Based Speaker Segmentation

IEEE Transactions on Audio, Speech, and Language Processing

Speaker diarization exploiting the eigengap criterion and cluster ensembles

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic phone segmentation techniques based on model selection criteria are studied. We investigate the phone boundary detection efficiency of entropy- and Bayesian- based model selection criteria in continuous speech based on the DISTBIC hybrid segmentation algorithm. DISTBIC is a text-independent bottom-up approach that identifies sequential model changes by combining metric distances with statistical hypothesis testing. Using robust statistics and small sample corrections in the baseline DISTBIC algorithm, phone boundary detection accuracy is significantly improved, while false alarms are reduced. We also demonstrate further improvement in phonemic segmentation by taking into account how the model parameters are related in the probability density functions of the underlying hypotheses as well as in the model selection via the information complexity criterion and by employing M-estimators of the model parameters. The proposed DISTBIC variants are tested on the NTIMIT database and the achieved F1 measure is 74.7% using a 20-ms tolerance in phonemic segmentation.