WordNet: a lexical database for English
Communications of the ACM
Readings in information visualization
Foundations of statistical natural language processing
Foundations of statistical natural language processing
The effect of information scent on searching information: visualizations of large tree structures
AVI '00 Proceedings of the working conference on Advanced visual interfaces
Contextual correlates of synonymy
Communications of the ACM
Placing search in context: the concept revisited
ACM Transactions on Information Systems (TOIS)
The bloodhound project: automating discovery of web usability issues using the InfoScentπ simulator
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL
EMCL '01 Proceedings of the 12th European Conference on Machine Learning
Co-occurrence vectors from corpora vs. distance vectors from dictionaries
COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A comparison of LSA, wordNet and PMI-IR for predicting user click behavior
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Frequency estimates for statistical word similarity measures
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Navigation in degree of interest trees
Proceedings of the working conference on Advanced visual interfaces
Using information content to evaluate semantic similarity in a taxonomy
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
An empirical study of required dimensionality for large-scale latent semantic indexing applications
Proceedings of the 17th ACM conference on Information and knowledge management
The microstructures of social tagging: a rational model
Proceedings of the 2008 ACM conference on Computer supported cooperative work
Automated semantic elaboration of web site information architecture
Interacting with Computers
Distributional phrasal paraphrase generation for statistical machine translation
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Hi-index | 0.00 |
In this paper we describe a comparison among three systems that estimate semantic similarity between words: Latent Semantic Analysis (Landauer & Dumais, 1997), Pointwise Mutual Information (Turney, 2001), and Generalized Latent Semantic Analysis (Matveeva, Levow, Farahat, & Royer, 2005). We compare all these techniques on a unique corpus (TASA) and, for PMI and GLSA, we also report performance on a larger web-based corpus. The evaluation is carried out through two kinds of tests: (1) synonymy tests, and (2) comparison with human word similarity judgments. The results indicate that for large corpora PMI works best on word similarity tests, and GLSA on synonymy tests. For the smaller TASA corpus, GLSA produced the best performance on most tests. A large corpus improved the performance of PMI, but, in most cases, did not improve that of GLSA.