Large-scale malware indexing using function-call graphs

Authors:
Xin Hu;Tzi-cker Chiueh;Kang G. Shin
Affiliations:
University of Michigan, Ann Arbor, Ann Arbor, MI, USA;Stony Brook University, Stony Brook, NY, USA;University of Michigan, Ann Arbor, Ann Arbor, MI, USA
Venue:
Proceedings of the 16th ACM conference on Computer and communications security
Year:
2009

Citing 19
Cited 27

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Bayesian Graph Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithmics and applications of tree and graph searching

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Content-Based Image Indexing

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Substructure similarity search in graph databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Practical analysis of stripped binary code

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Closure-Tree: An Index Structure for Graph Queries

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Binary Linear Programming Formulation of the Graph Edit Distance

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Graph indexing: tree + delta

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
TALE: A Tool for Approximate Large Graph Matching

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Bipartite graph matching for computing the edit distance of graphs

GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Polymorphic worm detection using structural information of executables

RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection

Fast malware classification by automated behavioral graph matching

Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
On challenges in evaluating malware clustering

RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Deriving common malware behavior through graph clustering

Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security
Improved call graph comparison using simulated annealing

Proceedings of the 2011 ACM Symposium on Applied Computing
Supervised learning for provenance-similarity of binaries

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Cloaking malware with the trusted platform module

SEC'11 Proceedings of the 20th USENIX conference on Security
A comparative assessment of malware classification using binary texture analysis and dynamic analysis

Proceedings of the 4th ACM workshop on Security and artificial intelligence
BitShred: feature hashing malware for scalable triage and semantic analysis

Proceedings of the 18th ACM conference on Computer and communications security
Graph-based malware detection using dynamic analysis

Journal in Computer Virology
Malware classification based on call graph clustering

Journal in Computer Virology
deRop: removing return-oriented programming from malware

Proceedings of the 27th Annual Computer Security Applications Conference
VAMO: towards a fully automated malware clustering validity analysis

Proceedings of the 28th Annual Computer Security Applications Conference
BinSlayer: accurate comparison of binary executables

PPREW '13 Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop
Fast, scalable detection of "Piggybacked" mobile applications

Proceedings of the third ACM conference on Data and application security and privacy
A similarity metric method of obfuscated malware using function-call graph

Journal in Computer Virology
Using file relationships in malware classification

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Juxtapp: a scalable system for detecting code reuse among android applications

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
iBinHunt: binary hunting with inter-procedural control flow

ICISC'12 Proceedings of the 15th international conference on Information Security and Cryptology
Discriminant malware distance learning on structural information for automated malware classification

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
The impact of vendor customizations on android security

Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Structural detection of android malware using embedded call graphs

Proceedings of the 2013 ACM workshop on Artificial intelligence and security
SigMal: a static signal processing based malware triage

Proceedings of the 29th Annual Computer Security Applications Conference
Simseer and bugwise: web services for binary-level software similarity and defect detection

AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Exploring discriminatory features for automated malware classification

DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Towards automatic software lineage inference

SEC'13 Proceedings of the 22nd USENIX conference on Security
Systematic audit of third-party android phones

Proceedings of the 4th ACM conference on Data and application security and privacy

Quantified Score

Hi-index	0.00

Visualization

Abstract

A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware's function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed up this search, we have developed an efficient method to compute graph similarity that exploits structural and instruction-level information in the underlying malware programs, and a multi-resolution indexing scheme that uses a computationally economical feature vector for early pruning and resorts to a more accurate but computationally more expensive graph similarity function only when it needs to pinpoint the most similar neighbors. Results of a comprehensive performance study of the SMIT prototype using a database of more than 100,000 malware demonstrate the effective pruning power and scalability of its nearest neighbor search mechanisms.