Speech segmentation using regression fusion of boundary predictions

Authors:
Iosif Mporas;Todor Ganchev;Nikos Fakotakis
Affiliations:
Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, GR-26500, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, GR-26500, Greece;Artificial Intelligence Group, Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, GR-26500, Greece
Venue:
Computer Speech and Language
Year:
2010

Citing 16
Cited 3

Automatic segmentation and labeling of speech based on Hidden Markov Models

Speech Communication
A hierarchical method of automatic speech segmentation for synthesis applications

Speech Communication
Automatic segmentation of speech recorded in unknown noisy channel characteristics

Speech Communication - Special issue on robust speech recognition
An Introduction to Text-to-Speech Synthesis

An Introduction to Text-to-Speech Synthesis
Training v-support vector regression: theory and algorithms

Neural Computation
Phonetic alignment: speech synthesis-based vs. viterbi-based

Speech Communication
New Support Vector Algorithms

Neural Computation
Multi-lingual label alignment using acoustic-phonetic features derived by neural-network technique

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Automatic segmentation and labeling of speech

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Simultaneous speech segmentation and phoneme recognition using dynamic programming

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis

Speech Communication
Automatic segmentation of speech

IEEE Transactions on Signal Processing
Automatic Phonetic Segmentation by Score Predictive Model for the Corpora of Mandarin Singing Voices

IEEE Transactions on Audio, Speech, and Language Processing
A Large Margin Algorithm for Speech-to-Phoneme and Music-to-Score Alignment

IEEE Transactions on Audio, Speech, and Language Processing
On Using Multiple Models for Automatic Speech Segmentation

IEEE Transactions on Audio, Speech, and Language Processing
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory

Adaptive phoneme alignment based on rough set theory

RSCTC'10 Proceedings of the 7th international conference on Rough sets and current trends in computing
Improvements on automatic speech segmentation at the phonetic level

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the present work we study the appropriateness of a number of linear and non-linear regression methods, employed on the task of speech segmentation, for combining multiple phonetic boundary predictions which are obtained through various segmentation engines. The proposed fusion schemes are independent of the implementation of the individual segmentation engines as well as from their number. In order to illustrate the practical significance of the proposed approach, we employ 112 speech segmentation engines based on hidden Markov models (HMMs), which differ in the setup of the HMMs and in the speech parameterization techniques they employ. Specifically we relied on sixteen different HMMs setups and on seven speech parameterization techniques, four of which are recent and their performance on the speech segmentation task have not been evaluated yet. In the evaluation experiments we contrast the performance of the proposed fusion schemes for phonetic boundary predictions against some recently reported methods. Throughout this comparison, on the established for the phonetic segmentation task TIMIT database, we demonstrate that the support vector regression scheme is capable of achieving more accurate predictions, when compared to other fusion schemes reported so far.