Unsupervised data processing for classifier-based speech translator

  • Authors:
  • Emil Ettelaie;Panayiotis G. Georgiou;Shrikanth S. Narayanan

  • Affiliations:
  • Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...;Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...;Signal Analysis and Interpretation Laboratory, Ming Hsieh Department of Electrical Engineering, Viterbi School of Engineering, University of Southern California, 3710 S. McClintock Ave., RTH 320, ...

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Concept classification has been used as a translation method and has shown notable benefits within the suite of speech-to-speech translation applications. However, the main bottleneck in achieving an acceptable performance with such classifiers is the cumbersome task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation needs a distance measure to compare sentences based on the concept they convey. Here, we introduce a new method of sentence comparison that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based statistical machine translation system are used to compare the concepts of the source sentences. Three clustering methods are adapted to support the concept-base distance. These methods are applied to prepare groups of paraphrases and use them as training sets in concept classification tasks. The statistical machine translation is also used to enhance the training data for the classifier which is crucial when such data are sparse. Experiments show the effectiveness of the proposed methods.