Impact of Failure on Interconnection Networks for Large Storage Systems

  • Authors:
  • Qin Xin;Ethan L. Miller;Thomas J. E. Schwarz;Darrell D. E. Long

  • Affiliations:
  • University of California, Santa Cruz;University of California, Santa Cruz;Santa Clara University;University of California, Santa Cruz

  • Venue:
  • MSST '05 Proceedings of the 22nd IEEE / 13th NASA Goddard Conference on Mass Storage Systems and Technologies
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent advances in large-capacity, low-cost storage devices have led to active research in design of large-scale storage systems built from commodity devices for supercomputing applications. Such storage systems, composed of thousands of storage devices, are required to provide high system bandwidth and petabyte-scale data storage. A robust network interconnection is essential to achieve high bandwidth, low latency, and reliable delivery during data transfers. However, failures, such as temporary link outages and node crashes, are inevitable. We discuss the impact of potential failures on network interconnections in very large-scale storage systems and analyze the trade-offs among several storage network topologies by simulations. Our results suggest that a good interconnect topology be essential to fault-tolerance of a petabyte-scale storage system.