When are links useful? experiments in text classification

  • Authors:
  • Michelle Fisher;Richard Everson

  • Affiliations:
  • Department of Computer Science, Exeter University;Department of Computer Science, Exeter University

  • Venue:
  • ECIR'03 Proceedings of the 25th European conference on IR research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Link analysis methods have become popular for information access tasks, especially information retrieval, where the link information in a document collection is used to complement the traditionally used content information. However, there has been little firm evidence to confirm the utility of link information. We show that link information can be useful when the document collection has a sufficiently high link density and links are of sufficiently high quality. We report experiments on text classification of the Cora and WebKB data sets using Probabilistic Latent Semantic Analysis and Probabilistic Hypertext Induced Topic Selection. Comparison with manually assigned classes shows that link information enhances classification in data with sufficiently high link density, but is detrimental to performance at low link densities or if the quality of the links is degraded. We introduce a new frequency-based method for selecting the most useful citations from a document collection for use in the model.