Probabilistic latent semantic indexing
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Collocation Dictionary Optimization Using WordNetand k-Nearest Neighbor Learning
Machine Translation
SVDPACKC (Version 1.0) User''s Guide
SVDPACKC (Version 1.0) User''s Guide
A comparative evaluation of data-driven models in translation selection of machine translation
COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Web usage mining based on probabilistic latent semantic analysis
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
An empirical study of required dimensionality for large-scale latent semantic indexing applications
Proceedings of the 17th ACM conference on Information and knowledge management
The VLDB Journal — The International Journal on Very Large Data Bases
Probabilistic latent semantic user segmentation for behavioral targeted advertising
Proceedings of the Third International Workshop on Data Mining and Audience Intelligence for Advertising
Efficient Probabilistic Latent Semantic Analysis through Parallelization
AIRS '09 Proceedings of the 5th Asia Information Retrieval Symposium on Information Retrieval Technology
Data mining for web personalization
The adaptive web
A mixture model for expert finding
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Hi-index | 0.00 |
In this paper, we try to find empirically the optimal dimensionality in data-driven models, Latent Semantic Analysis (LSA) model and Probabilistic Latent Semantic Analysis (PLSA) model. These models are used for building linguistic semantic knowledge which could be used in estimating contextual semantic similarity for the target word selection in English-Korean machine translation. We also facilitate k-Nearest Neighbor learning algorithm. We diversify our experiments by analyzing the covariance between the value of k in k-NN learning and accuracy of selection, in addition to that between the dimensionality and the accuracy. While we could not find regular tendency of relationship between the dimensionality and the accuracy, however, we could find the optimal dimensionality having the most sound distribution of data during experiments.