A comparative evaluation of data-driven models in translation selection of machine translation

Authors:
Yu-Seop Kim;Jeong-Ho Chang;Byoung-Tak Zhang
Affiliations:
Ewha Woman's Univ., Seoul, Korea;Seoul National Univ., Seoul, Korea;Seoul National Univ., Seoul, Korea
Venue:
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Year:
2002

Citing 6
Cited 2

Instance-Based Learning Algorithms

Machine Learning
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Unsupervised Learning by Probabilistic Latent Semantic Analysis

Machine Learning
SVDPACKC (Version 1.0) User''s Guide

SVDPACKC (Version 1.0) User''s Guide
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

An empirical study on dimensionality optimization in text mining for linguistic knowledge acquisition

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Using latent semantics for NE translation

ICCPOL'06 Proceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a comparative evaluation of two data-driven models used in translation selection of English-Korean machine translation. Latent semantic analysis(LSA) and probabilistic latent semantic analysis (PLSA) are applied for the purpose of implementation of data-driven models in particular. These models are able to represent complex semantic structures of given contexts, like text passages. Grammatical relationships, stored in dictionaries, are utilized in translation selection essentially. We have used k-nearest neighbor (k-NN) learning to select an appropriate translation of the unseen instances in the dictionary. The distance of instances in k-NN is computed by estimating the similarity measured by LSA and PLSA. For experiments, we used TREC data(AP news in 1988) for constructing latent semantic spaces of two models and Wall Street Journal corpus for evaluating the translation accuracy in each model. PLSA selected relatively more accurate translations than LSA in the experiment, irrespective of the value of k and the types of grammatical relationship.