Automated hoarding for mobile computers
Proceedings of the sixteenth ACM symposium on Operating systems principles
A large-scale study of file-system contents
SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Workload characterization of a Web proxy in a cable modem environment
ACM SIGMETRICS Performance Evaluation Review
An end-to-end approach to globally scalable network storage
Proceedings of the 2002 conference on Applications, technologies, architectures, and protocols for computer communications
Storage Management for Web Proxies
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Data Grids, Collections, and Grid Bricks
MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
Group-Based Management of Distributed File Caches
ICDCS '02 Proceedings of the 22 nd International Conference on Distributed Computing Systems (ICDCS'02)
Characteristics of WWW Client-based Traces
Characteristics of WWW Client-based Traces
Changes in Web Client Access Patterns: Characteristics and Caching Implications
Changes in Web Client Access Patterns: Characteristics and Caching Implications
Characterizing Reference Locality in the WWW
Characterizing Reference Locality in the WWW
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Choosing Replica Placement Heuristics for Wide-Area Systems
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
The LOCKSS peer-to-peer digital preservation system
ACM Transactions on Computer Systems (TOCS)
Optimal File-Bundle Caching Algorithms for Data-Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The Livny and Plank-Beck Problems: Studies in Data Movement on the Computational Grid
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Taming aggressive replication in the Pangaea wide-area file system
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Active and logistical networking for grid computing: the e-Toile architecture
Future Generation Computer Systems
GRENCHMARK: A Framework for Analyzing, Testing, and Comparing Grids
CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications
Embedded inodes and explicit grouping: exploiting disk bandwidth for small files
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Interest-aware information dissemination in small-world communities
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
DiskSeen: exploiting disk layout and access history to enhance I/O prefetch
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
On the dynamic resource availability in grids
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
File caching in data intensive scientific applications on data-grids
DMG 2005 Proceedings of the First VLDB conference on Data Management in Grids
Scheduling file transfers for data-intensive jobs on heterogeneous clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
The characteristics and performance of groups of jobs in grids
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
File Clustering Based Replication Algorithm in a Grid Environment
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
A data placement strategy in scientific cloud workflows
Future Generation Computer Systems
Efficiently identifying working sets in block I/O streams
Proceedings of the 4th Annual International Conference on Systems and Storage
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Experiences with 100Gbps network applications
Proceedings of the fifth international workshop on Data-Intensive Distributed Computing Date
Hi-index | 0.00 |
The analysis of data usage in a large set of real traces from a high-energy physics collaboration revealed the existence of an emergent grouping of files that we coined "filecules". This paper presents the benefits of using this file grouping for prestaging data and compares it with previously proposed file grouping techniques along a range of performance metrics. Our experiments with real workloads demonstrate that filecule grouping is a reliable and useful abstraction for data management in science Grids; that preserving time locality for data prestaging is highly recommended; that job reordering with respect to data availability has significant impact on throughput; and finally, that a relatively short history of traces is a good predictor for filecule grouping. Our experimental results provide lessons for workload modeling and suggest design guidelines for data management in data-intensive resource-sharing environments.