Software—Practice & Experience
Automated hoarding for mobile computers
Proceedings of the sixteenth ACM symposium on Operating systems principles
Internet Web servers: workload characterization and performance implications
IEEE/ACM Transactions on Networking (TON)
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
File system usage in Windows NT 4.0
Proceedings of the seventeenth ACM symposium on Operating systems principles
Workload characterization of a Web proxy in a cable modem environment
ACM SIGMETRICS Performance Evaluation Review
Characterizing reference locality in the WWW
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
IEEE Internet Computing
Storage Management for Web Proxies
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Disk cache replacement algorithm for storage resource managers in data grids
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Data Grids, Collections, and Grid Bricks
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
An analysis of Internet content delivery systems
ACM SIGOPS Operating Systems Review - OSDI '02: Proceedings of the 5th symposium on Operating systems design and implementation
Group-Based Management of Distributed File Caches
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
The LOCKSS peer-to-peer digital preservation system
ACM Transactions on Computer Systems (TOCS)
Optimal File-Bundle Caching Algorithms for Data-Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
File size distribution on UNIX systems: then and now
ACM SIGOPS Operating Systems Review
GRENCHMARK: A Framework for Analyzing, Testing, and Comparing Grids
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Build-and-Test Workloads for Grid Middleware: Problem, Analysis, and Applications
CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Embedded inodes and explicit grouping: exploiting disk bandwidth for small files
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Interest-aware information dissemination in small-world communities
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
File grouping for scientific data management: lessons from experimenting with real traces
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
On the dynamic resource availability in grids
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
File caching in data intensive scientific applications on data-grids
DMG 2005 Proceedings of the First VLDB conference on Data Management in Grids
Scheduling file transfers for data-intensive jobs on heterogeneous clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
The characteristics and performance of groups of jobs in grids
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
The analysis of data usage in a large set of real traces from a high-energy physics collaboration revealed the existence of an emergent grouping of files that we coined "filecules". This paper presents the benefits of using this file grouping for prestaging data and compares it with previously proposed file grouping techniques along a range of performance metrics. Our experiments with real workloads demonstrate that filecule grouping is a reliable and useful abstraction for data management in science Grids; that preserving time locality for data prestaging is highly recommended; that job reordering with respect to data availability has significant impact on throughput; and finally, that a relatively short history of traces is a good predictor for filecule grouping. Our experimental results provide lessons for workload modeling and suggest design guidelines for data management in data-intensive resource-sharing environments.