Unsupervised data processing for classifier-based speech translator

Authors:
Emil Ettelaie;Panayiotis G. Georgiou;Shrikanth S. Narayanan
Affiliations:
Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...;Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...;Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...
Venue:
Computer Speech and Language
Year:
2013

Citing 15
Cited 0

Concept decompositions for large sparse text data using clustering

Machine Learning
The Cluster Dissection and Analysis Theory FORTRAN Programs Examples

The Cluster Dissection and Analysis Theory FORTRAN Programs Examples
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Iterative Clustering of High Dimensional Text Data Augmented by Local Search

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Statistical phrase-based translation

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Augmenting lexicons automatically: clustering semantically related adjectives

HLT '93 Proceedings of the workshop on Human Language Technology
Collecting paraphrase corpora from volunteer contributors

Proceedings of the 3rd international conference on Knowledge capture
The Alignment Template Approach to Statistical Machine Translation

Computational Linguistics
How to talk to a hologram

Proceedings of the 11th international conference on Intelligent user interfaces
Transonics: a practical speech-to-speech translator for English-Farsi medical dialogues

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Moses: open source toolkit for statistical machine translation

ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
S-MINDS 2-way speech-to-speech translation system

MST '06 Proceedings of the Workshop on Medical Speech Translation
IBM MASTOR system: multilingual automatic speech-to-speech translator

MST '06 Proceedings of the Workshop on Medical Speech Translation
Factor graphs and the sum-product algorithm

IEEE Transactions on Information Theory
Divergence measures based on the Shannon entropy

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Concept classification has been used as a translation method and has shown notable benefits within the suite of speech-to-speech translation applications. However, the main bottleneck in achieving an acceptable performance with such classifiers is the cumbersome task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation needs a distance measure to compare sentences based on the concept they convey. Here, we introduce a new method of sentence comparison that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based statistical machine translation system are used to compare the concepts of the source sentences. Three clustering methods are adapted to support the concept-base distance. These methods are applied to prepare groups of paraphrases and use them as training sets in concept classification tasks. The statistical machine translation is also used to enhance the training data for the classifier which is crucial when such data are sparse. Experiments show the effectiveness of the proposed methods.