Concept decompositions for large sparse text data using clustering
Machine Learning
The Cluster Dissection and Analysis Theory FORTRAN Programs Examples
The Cluster Dissection and Analysis Theory FORTRAN Programs Examples
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Iterative Clustering of High Dimensional Text Data Augmented by Local Search
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Statistical phrase-based translation
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Augmenting lexicons automatically: clustering semantically related adjectives
HLT '93 Proceedings of the workshop on Human Language Technology
Collecting paraphrase corpora from volunteer contributors
Proceedings of the 3rd international conference on Knowledge capture
The Alignment Template Approach to Statistical Machine Translation
Computational Linguistics
Proceedings of the 11th international conference on Intelligent user interfaces
Transonics: a practical speech-to-speech translator for English-Farsi medical dialogues
ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions
Moses: open source toolkit for statistical machine translation
ACL '07 Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions
S-MINDS 2-way speech-to-speech translation system
MST '06 Proceedings of the Workshop on Medical Speech Translation
IBM MASTOR system: multilingual automatic speech-to-speech translator
MST '06 Proceedings of the Workshop on Medical Speech Translation
Factor graphs and the sum-product algorithm
IEEE Transactions on Information Theory
Divergence measures based on the Shannon entropy
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Concept classification has been used as a translation method and has shown notable benefits within the suite of speech-to-speech translation applications. However, the main bottleneck in achieving an acceptable performance with such classifiers is the cumbersome task of annotating large amounts of training data. Any attempt to develop a method to assist in, or to completely automate, data annotation needs a distance measure to compare sentences based on the concept they convey. Here, we introduce a new method of sentence comparison that is motivated from the translation point of view. In this method the imperfect translations produced by a phrase-based statistical machine translation system are used to compare the concepts of the source sentences. Three clustering methods are adapted to support the concept-base distance. These methods are applied to prepare groups of paraphrases and use them as training sets in concept classification tasks. The statistical machine translation is also used to enhance the training data for the classifier which is crucial when such data are sparse. Experiments show the effectiveness of the proposed methods.