Generating links by mining quotations

  • Authors:
  • Okan Kolak;Bill N. Schilit

  • Affiliations:
  • Google Research, Mountain View, CA, USA;Google Research, Mountain View, CA, USA

  • Venue:
  • Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Scanning books, magazines, and newspapers has become a widespread activity because people believe that much of the worlds information still resides off-line. In general after works are scanned they are indexed for search and processed to add links. This paper describes a new approach to automatically add links by mining popularly quoted passages. Our technique connects elements that are semantically rich, so strong relations are made. Moreover, link targets point within a work, facilitating navigation. This paper makes three contributions. We describe a scalable algorithm for mining repeated word sequences from extremely large text corpora. Second, we present techniques that filter and rank the repeated sequences for quotations. Third, we present a new user interface for navigating across and within works in the collection using quotation links. Our system has been run on a digital library of over 1 million books and has been used by thousands of people.