Building and supporting a massive data infrastructure for the masses

  • Authors:
  • Anurag Shankar;Gustav Meglicki;Jeff Russ;Haichuan Yang;E. Chris Garrison

  • Affiliations:
  • Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN;Indiana University, Bloomington, IN

  • Venue:
  • SIGUCCS '02 Proceedings of the 30th annual ACM SIGUCCS conference on User services
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

On a typical university campus, the words "massive data storage" (MDS) usually bring to mind technology high-end, high performance computing (HPC) users might use. This is because academic supercomputer sites have traditionally provided a tightly interwoven HPC and high performance, MDS fabric to their users for decades. However, a new paradigm in data storage is now emerging where large, central, hierarchical storage management (HSM) services may play an increasingly important role in the non-HPC, general, academic user environment. For this user segment, information technology (particularly storage) is beginning to play a vital role in research. Personal computers or departmental storage systems are no longer adequate to store or to back up data. In fact, data volumes today are growing exponentially, without a concomitant increase in the human resources required to manage them. At Indiana University (IU), with around 100,000 users distributed geographically over eight campuses throughout the state of Indiana, we have started tackling these issues head on. Considerable thought, planning, and recent experiences lead us to conclude that extending a traditionally high-end, MDS system to "the masses" is a viable model for a large, research institution. Using the High Performance Storage System (HPSS) HSM software at IU, we have built a scalable storage infrastructure with a current capacity of 500 terabytes (TB hereafter; 1 TB = 1,000 gigabytes). Using a combination of secure, file system interface to HPSS and gateway software, we have extended the reach of this high-end HSM system over the network to all eight campuses of IU. It is now possible for a user at IU to easily and securely store and to retrieve terabytes of data from their Unix, Linux, Windows, Macintosh desktops or via the Web from anywhere. We have also developed a geographically distributed storage system that operates over a wide area network (WAN) between our Bloomington and Indianapolis campuses, located nearly fifty miles apart. We have recently implemented remote mirroring of data across this WAN for disaster recovery.In this paper, we provide an overview of Indiana University's distributed, massive data storage service infrastructure. We also discuss how it is being used and supported today at IU and how it is making an impact in research areas that have been poorly represented or un-represented among high-end information technology (IT) users in the past.