Large-scale malware indexing using function-call graphs

  • Authors:
  • Xin Hu;Tzi-cker Chiueh;Kang G. Shin

  • Affiliations:
  • University of Michigan, Ann Arbor, Ann Arbor, MI, USA;Stony Brook University, Stony Brook, NY, USA;University of Michigan, Ann Arbor, Ann Arbor, MI, USA

  • Venue:
  • Proceedings of the 16th ACM conference on Computer and communications security
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware's function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed up this search, we have developed an efficient method to compute graph similarity that exploits structural and instruction-level information in the underlying malware programs, and a multi-resolution indexing scheme that uses a computationally economical feature vector for early pruning and resorts to a more accurate but computationally more expensive graph similarity function only when it needs to pinpoint the most similar neighbors. Results of a comprehensive performance study of the SMIT prototype using a database of more than 100,000 malware demonstrate the effective pruning power and scalability of its nearest neighbor search mechanisms.