Relational Learning with Statistical Predicate Invention: Better Models for Hypertext

  • Authors:
  • Mark Craven;Seán Slattery

  • Affiliations:
  • Department of Biostatistics & Medical Informatics, University of Wisconsin, Madison, WI 53706, USA. craven@biostat.wisc.edu;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA. jslttery@cs.cmu.edu

  • Venue:
  • Machine Learning
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new approach to learning hypertext classifiers that combines a statistical text-learning method with a relational rule learner. This approach is well suited to learning in hypertext domains because its statistical component allows it to characterize text in terms of word frequencies, whereas its relational component is able to describe how neighboring documents are related to each other by hyperlinks that connect them. We evaluate our approach by applying it to tasks that involve learning definitions for (i) classes of pages, (ii) particular relations that exist between pairs of pages, and (iii) locating a particular class of information in the internal structure of pages. Our experiments demonstrate that this new approach is able to learn more accurate classifiers than either of its constituent methods alone.