Contextual correlates of synonymy
Communications of the ACM
Placing search in context: the concept revisited
ACM Transactions on Information Systems (TOIS)
An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources
IEEE Transactions on Knowledge and Data Engineering
Evaluating WordNet-based Measures of Lexical Semantic Relatedness
Computational Linguistics
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
WikiRelate! computing semantic relatedness using wikipedia
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Knowledge derived from wikipedia for computing semantic relatedness
Journal of Artificial Intelligence Research
Wikipedia-based semantic interpretation for natural language processing
Journal of Artificial Intelligence Research
Computing semantic relatedness using Wikipedia-based explicit semantic analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A word at a time: computing word relatedness using temporal semantic analysis
Proceedings of the 20th international conference on World wide web
Concept vector for semantic similarity and relatedness based on WordNet structure
Journal of Systems and Software
CLOUDCOM '11 Proceedings of the 2011 IEEE Third International Conference on Cloud Computing Technology and Science
Mahout in Action
Hi-index | 0.00 |
Computing semantic word similarity and relatedness requires access to vast amounts of semantic space for effective analysis. As a consequence, it is time-consuming to extract useful information from a large amount of data on a single workstation. In this paper, we propose a system, called Distributed Semantic Analysis (DSA), that integrates a distributed-based approach with semantic analysis. DSA builds a list of concept vectors associated with each word by exploiting the knowledge provided by Wikipedia articles. Based on such lists, DSA calculates the degree of semantic relatedness between two words through the cosine measure. The proposed solution is built on top of the Hadoop MapReduce framework and the Mahout machine learning library. Experimental results show two major improvements over the state of the art, with particular reference to the Explicit Semantic Analysis method. First, our distributed approach significantly reduces the computation time to build the concept vectors, thus enabling the use of larger inputs that is the basis for more accurate results. Second, DSA obtains a very high correlation of computed relatedness with reference benchmarks derived by human judgements. Moreover, its accuracy is higher than solutions reported in the literature over multiple benchmarks.