Searching large text collections

  • Authors:
  • Ricardo Baeza-Yates;Alistair Moffat;Gonzalo Navarro

  • Affiliations:
  • Dept. of Computer Science, Universidad de Chile, Santiago, Chile;Dept. Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia;Dept. of Computer Science, Universidad de Chile, Santiago, Chile

  • Venue:
  • Handbook of massive data sets
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this chapter we present the main data structures and algorithms for searching large text collections. We emphasize inverted files, the most used index, but also review suffix arrays, which are useful in a number of specialized applications. We also cover parallel and distributed implementations of these two structures. As an example, we show how mechanisms based upon inverted files can be used to index and search the Web.