SCAN-Lite: enterprise-wide analysis on the cheap

  • Authors:
  • Craig A.N. Soules;Kimberly Keeton;Charles B. Morrey, III

  • Affiliations:
  • HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 4th ACM European conference on Computer systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Background data analysis due to virus scanning, backup, and desktop search is increasingly prevalent on client systems. As the number of tools and their resource requirements grow, their impact on foreground workloads can be prohibitive. This creates a tension between users' foreground work and the background work that makes information management possible. We present a system called SCAN-Lite that addresses this tension. SCAN-Lite exploits the fact that data in an enterprise is often replicated to efficiently schedule background data analyses. It uses content hashing to identify duplicate content, and scans each unique piece of content only once. It delays scheduling these scans to increase the likelihood that the content will be replicated on multiple machines, thus providing more choices for where to perform the scan. Furthermore, it prioritizes machines to maximize use of idle time and minimize the impact on foreground activities. We evaluate SCAN-Lite using measurements of enterprise replication behavior. We find that SCAN-Lite significantly improves scanning performance over the naive approach, and that it effectively exploits replication to reduce total work done and the impact on client foreground activity.