Completing wikipedia's hyperlink structure through dimensionality reduction

Authors:
Robert West;Doina Precup;Joelle Pineau
Affiliations:
McGill University, Montréal, Québec, Canada;McGill University, Montréal, Québec, Canada;McGill University, Montréal, Québec, Canada
Venue:
Proceedings of the 18th ACM conference on Information and knowledge management
Year:
2009

Citing 8
Cited 2

Learner: a system for acquiring commonsense knowledge by analogy

Proceedings of the 2nd international conference on Knowledge capture
Discovering missing links in Wikipedia

Proceedings of the 3rd international workshop on Link discovery
Wikify!: linking documents to encyclopedic knowledge

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Eigenfaces for recognition

Journal of Cognitive Neuroscience
Learning to link with wikipedia

Proceedings of the 17th ACM conference on Information and knowledge management
Cheap and fast---but is it good?: evaluating non-expert annotations for natural language tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
AnalogySpace: reducing the dimensionality of common sense knowledge

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Computing semantic relatedness using Wikipedia-based explicit semantic analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence

Automatically suggesting topics for augmenting text documents

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Exploiting potential citation papers in scholarly paper recommendation

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.01

Visualization

Abstract

Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyperlinks. However, since these links are created by human authors, links one would expect to see are often missing. The goal of this work is to detect such gaps automatically. In this paper, we propose a novel method for augmenting the structure of hyperlinked document collections such as Wikipedia. It does not require the extraction of any manually defined features from the article to be augmented. Instead, it is based on principal component analysis, a well-founded mathematical generalization technique, and predicts new links purely based on the statistical structure of the graph formed by the existing links. Our method does not rely on the textual content of articles; we are exploiting only hyperlinks. A user evaluation of our technique shows that it improves the quality of top link suggestions over the state of the art and that the best predicted links are significantly more valuable than the 'average' link already present in Wikipedia. Beyond link prediction, our algorithm can potentially be used to point out topics an article misses to cover and to cluster articles semantically.