Computing text semantic relatedness using the contents and links of a hypertext encyclopedia: extended abstract

Authors:
Majid Yazdani;Andrei Popescu-Belis
Affiliations:
Idiap Research Institute and EPFL. Martigny, Switzerland;Idiap Research Institute, Martigny, Switzerland
Venue:
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Year:
2013

Citing 10
Cited 0

Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A Discriminative Kernel-Based Approach to Rank Images from Text Queries

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Experimental Study of Graph Connectivity for Unsupervised Word Sense Disambiguation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to rank with (a lot of) word features

Information Retrieval
A Random Walk Framework to Compute Textual Semantic Similarity: A Unified Model for Three Benchmark Tasks

ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing
WSABIE: scaling up to large vocabulary image annotation

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Computing text semantic relatedness using the contents and links of a hypertext encyclopedia

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose methods for computing semantic relatedness between words or texts by using knowledge from hypertext encyclopedias such as Wikipedia. A network of concepts is built by filtering the encyclopedia's articles, each concept corresponding to an article. A random walk model based on the notion of Visiting Probability (VP) is employed to compute the distance between nodes, and then between sets of nodes. To transfer learning from the network of concepts to text analysis tasks, we develop two common representation approaches. In the first approach, the shared representation space is the set of concepts in the network and every text is represented in this space. In the second approach, a latent space is used as the shared representation, and a transformation from words to the latent space is trained over VP scores. We applied our methods to four important tasks in natural language processing: word similarity, document similarity, document clustering and classification, and ranking in information retrieval. The performance is state-of-the-art or close to it for each task, thus demonstrating the generality of the proposed knowledge resource and the associated methods.