Building a distributed full-text index for the web

  • Authors:
  • Sergey Melink;Sriram Raghavan;Beverly Yang;Hector Garcia-Molina

  • Affiliations:
  • Stanford University, Computer Science Dept. Stanford, CA;Stanford University, Computer Science Dept. Stanford, CA;Stanford University, Computer Science Dept. Stanford, CA;Stanford University, Computer Science Dept. Stanford, CA

  • Venue:
  • ACM Transactions on Information Systems (TOIS)
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We identify crucial design issues in building a distributed inverted index for a large collection of Web pages. We introduce a novel pipelining technique for structuring the core index-building system that substantially reduces the index construction time. We also propose a storage scheme for creating and managing inverted files using an embedded database system. We suggest and compare different strategies for collecting global statistics from distributed inverted indexes. Finally, we present performance results from experiments on a testbed distributed Web indexing system that we have implemented.