Coordinating Simultaneous Caching of File Bundles from Tertiary Storage

Authors:
A. Shoshani;A. Sim;L. M. Bernardo;H. Nordberg
Affiliations:
-;-;-;-
Venue:
SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Year:
2000

Citing 2
Cited 5

Storage Access Coordination Using CORBA

DOA '99 Proceedings of the International Symposium on Distributed Objects and Applications
Multidimensional Indexing and Query Coordination for Tertiary Storage Management

SSDBM '99 Proceedings of the 11th International Conference on Scientific and Statistical Database Management

The design of a retrieval technique for high-dimensional data on tertiary storage

ACM SIGMOD Record
A retrieval technique for high-dimensional data and partially specified queries

Data & Knowledge Engineering
Effective Management of Hierarchical Storage Using Two Levels of Data Clustering

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Impact of Admission and Cache Replacement Policies on Response Times of Jobs on Data Grids

Cluster Computing
File caching in data intensive scientific applications on data-grids

DMG 2005 Proceedings of the First VLDB conference on Data Management in Grids

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a previous paper, we described a system called STACS (Storage Access Coordination System) for High Energy and Nuclear Physics (HENP) experiments. These experiments generate very large volumes of 驴event驴 data at a very high rate. The volumes of data may reach 100's of terabytes/year and therefore they are stored on robotic tape systems that are managed by a mass storage system. The data are stored as files on tapes according to a predetermined order, usually according to the order they are generated. A major bottleneck is the retrieval of subsets of these large datasets during the analysis phase. STACS is designed to optimize the use of a disk cache, and thus minimize the number of files read from tape.In this paper, we describe an interesting problem of disk staging coordination that goes beyond the file-at-a-time requirement. The problem stems from the need to coordinate the simultaneous caching of groups of files that we refer to as 驴bundles of files驴. All files from a bundle need to be at the same time in the disk cache in order for the analysis application to proceed. This is a radically different problem from the case where the analysis applications need only one file at a time. In this paper, we describe the method of identifying the file bundles, and the scheduling of bundle caching in such a way that files shared between bundles are not removed from the cache unnecessarily. We describe the methodology and the policies used to determine the order of caching bundles of files, and the order of removing files from the cache when space is needed.