SCAN-Lite: enterprise-wide analysis on the cheap

Authors:
Craig A.N. Soules;Kimberly Keeton;Charles B. Morrey, III
Affiliations:
HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA
Venue:
Proceedings of the 4th ACM European conference on Computer systems
Year:
2009

Citing 16
Cited 1

The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Feasibility of a serverless distributed file system deployed on an existing set of desktop PCs

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A low-bandwidth network file system

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Globally Distributed Content Delivery

IEEE Internet Computing
Venti: A New Approach to Archival Storage

FAST '02 Proceedings of the Conference on File and Storage Technologies
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Designing for Disasters

FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
Quickly finding near-optimal storage designs

ACM Transactions on Computer Systems (TOCS)
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Modeling the relative fitness of storage

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Idleness is not sloth

TCON'95 Proceedings of the USENIX 1995 Technical Conference Proceedings
A five-year study of file-system metadata

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Jumbo store: providing efficient incremental upload and versioning for a utility rendering service

FAST '07 Proceedings of the 5th USENIX conference on File and Storage Technologies
Avoiding the disk bottleneck in the data domain deduplication file system

FAST'08 Proceedings of the 6th USENIX Conference on File and Storage Technologies
Spyglass: fast, scalable metadata search for large-scale storage systems

FAST '09 Proccedings of the 7th conference on File and storage technologies
A comparison of load balancing techniques for scalable Web servers

IEEE Network: The Magazine of Global Internetworking

LazyBase: freshness vs. performance in information management

ACM SIGOPS Operating Systems Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background data analysis due to virus scanning, backup, and desktop search is increasingly prevalent on client systems. As the number of tools and their resource requirements grow, their impact on foreground workloads can be prohibitive. This creates a tension between users' foreground work and the background work that makes information management possible. We present a system called SCAN-Lite that addresses this tension. SCAN-Lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses. It uses content hashing to identify duplicate content, and scans each unique piece of content only once. It delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines, thus providing more choices for where to perform the scan. Furthermore, it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities. We evaluate SCAN-Lite using measurements of enterprise replication behavior. We find that SCAN-Lite significantly improves scanning performance over the naive approach, and that it effectively exploits replication to reduce total work done and the impact on client foreground activity.