Using a Wikipedia-based semantic relatedness measure for document clustering

Authors:
Majid Yazdani;Andrei Popescu-Belis
Affiliations:
Idiap Research Institute and EPFL, Rue Marconi, Martigny, Switzerland;Idiap Research Institute, Rue Marconi, Martigny, Switzerland
Venue:
TextGraphs-6 Proceedings of TextGraphs-6: Graph-based Methods for Natural Language Processing
Year:
2011

Citing 16
Cited 1

CYC: a large-scale investment in knowledge infrastructure

Communications of the ACM
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Machine Learning

Machine Learning
Document clustering with committees

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search

IEEE Transactions on Knowledge and Data Engineering
Latent dirichlet allocation

The Journal of Machine Learning Research
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Fast incremental proximity search in large graphs

Proceedings of the 25th international conference on Machine learning
Query suggestion using hitting time

Proceedings of the 17th ACM conference on Information and knowledge management
Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
WikiRelate! computing semantic relatedness using wikipedia

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Corpus-based and knowledge-based measures of text semantic similarity

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Personalizing PageRank for word sense disambiguation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
WikiWalk: random walks on Wikipedia for semantic relatedness

TextGraphs-4 Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing
A Random Walk Framework to Compute Textual Semantic Similarity: A Unified Model for Three Benchmark Tasks

ICSC '10 Proceedings of the 2010 IEEE Fourth International Conference on Semantic Computing

From machu_picchu to "rafting the urubamba river": anticipating information needs via the entity-query graph

Proceedings of the sixth ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

A graph-based distance between Wikipedia articles is defined using a random walk model, which estimates visiting probability (VP) between articles using two types of links: hyperlinks and lexical similarity relations. The VP to and from a set of articles is then computed, and approximations are proposed to make tractable the computation of semantic relatedness between every two texts in a large data set. The model is applied to document clustering on the 20 Newsgroups data set. Precision and recall are improved in comparison with previous textual distance algorithms.