External memory algorithms

  • Authors:
  • Jeffrey Scott Vitter

  • Affiliations:
  • Department of Computer Science, Duke University, Durham, NC

  • Venue:
  • Handbook of massive data sets
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper we survey the state of the art in the design and analysis of external memory (or EM) algorithms, where the goal is to exploit locality in order to reduce the I/O costs.For sorting and related problems like permuting and fast Fourier transform, the key paradigms include distribution and merging. The paradigm of disk striping offers an elegant way to use multiple disks in parallel. For sorting, however, disk striping can be nonoptimal with respect to I/O, so to gain further improvements we discuss distribution and merging techniques for using the disks independently. We consider EM paradigms for computations involving matrixes, geometric data, and graphs, and we look at problems caused by dynamic memory allocation. We report on some experiments in the domain of spatial databases using the TPIE system (Transparent Parallel I/O programming Environment). The newly developed EM algorithms and data structures that incorporate the paradigms we discuss in this chapter are significantly faster than methods currently used in practice.