Snowball: Scalable Storage on Networks of Workstations with Balanced Load

  • Authors:
  • Radek Vingralek;Yuri Breitbart;Gerhard Weikum

  • Affiliations:
  • Oracle Corp., 500 Oracle Parkway, Redwood Shores, CA 94065. E-mail: rvingral@us.oracle.com;Bell Laboratories, Room 2T-106, 600 Mountain Avenue, Murray Hill, NJ 07974. E-mail: yuri@research.bell-labs.com;Departhement of Computer Science, University of the Saarland, D-66041 Saarbruecken, Germany. E-mail: weikum@cs.uni-sb.de

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Networks of workstations are an emerging architectural paradigm forhigh-performance parallel and distributed systems. Exploitingnetworks of workstations for massive data management poses excitingchallenges. We consider here the problem of managingrecord-structured data in such an environment. For example, managing collections of HTML documents on a cluster of WWW servers is an important application for which our approachprovides support. The records are accessed by a dynamically growing set of clients based on a search key (e.g., a URL). To scale upthe throughput of client accesses with approximately constant responsetime, the records and thus also their access load are dynamicallyredistributed across a growing set of workstations. The paperaddresses two problems of realistic workloads: skewed accessfrequencies to the records and evolving access patterns wherepreviously cold records may become hot and vice versa. Our solutionincorporates load tracking at different levels of granularity andautomatically chooses the appropriate granularity for dynamic datamigrations. Experimental results based on a detailed simulation modelshow that our method is indeed successful in providing scalablecost/performance and explicitly controlling its level.