The grid: blueprint for a new computing infrastructure
The grid: blueprint for a new computing infrastructure
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A low-bandwidth network file system
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Globally Distributed Content Delivery
IEEE Internet Computing
Venti: A New Approach to Archival Storage
FAST '02 Proceedings of the Conference on File and Storage Technologies
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Quickly finding near-optimal storage designs
ACM Transactions on Computer Systems (TOCS)
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Modeling the relative fitness of storage
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A five-year study of file-system metadata
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service
FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system
FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Spyglass: fast, scalable metadata search for large-scale storage systems
FAST '09 Proccedings of the 7th conference on File and storage technologies
A comparison of load balancing techniques for scalable Web servers
IEEE Network: The Magazine of Global Internetworking
LazyBase: freshness vs. performance in information management
ACM SIGOPS Operating Systems Review
Hi-index | 0.00 |
Background data analysis due to virus scanning, backup, and desktop search is increasingly prevalent on client systems. As the number of tools and their resource requirements grow, their impact on foreground workloads can be prohibitive. This creates a tension between users' foreground work and the background work that makes information management possible. We present a system called SCAN-Lite that addresses this tension. SCAN-Lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses. It uses content hashing to identify duplicate content, and scans each unique piece of content only once. It delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines, thus providing more choices for where to perform the scan. Furthermore, it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities. We evaluate SCAN-Lite using measurements of enterprise replication behavior. We find that SCAN-Lite significantly improves scanning performance over the naive approach, and that it effectively exploits replication to reduce total work done and the impact on client foreground activity.