Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Distance-based indexing for high-dimensional metric spaces
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
IEEE Transactions on Pattern Analysis and Machine Intelligence
Algorithmics and applications of tree and graph searching
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Substructure similarity search in graph databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Practical analysis of stripped binary code
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Closure-Tree: An Index Structure for Graph Queries
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
A Binary Linear Programming Formulation of the Graph Edit Distance
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Deobfuscator: An Automated Approach to the Identification and Removal of Code Obfuscation
WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
TALE: A Tool for Approximate Large Graph Matching
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Bipartite graph matching for computing the edit distance of graphs
GbRPR'07 Proceedings of the 6th IAPR-TC-15 international conference on Graph-based representations in pattern recognition
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Polymorphic worm detection using structural information of executables
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Fast malware classification by automated behavioral graph matching
Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research
On challenges in evaluating malware clustering
RAID'10 Proceedings of the 13th international conference on Recent advances in intrusion detection
Deriving common malware behavior through graph clustering
Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security
Improved call graph comparison using simulated annealing
Proceedings of the 2011 ACM Symposium on Applied Computing
Supervised learning for provenance-similarity of binaries
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Cloaking malware with the trusted platform module
SEC'11 Proceedings of the 20th USENIX conference on Security
Proceedings of the 4th ACM workshop on Security and artificial intelligence
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
Graph-based malware detection using dynamic analysis
Journal in Computer Virology
Malware classification based on call graph clustering
Journal in Computer Virology
deRop: removing return-oriented programming from malware
Proceedings of the 27th Annual Computer Security Applications Conference
VAMO: towards a fully automated malware clustering validity analysis
Proceedings of the 28th Annual Computer Security Applications Conference
BinSlayer: accurate comparison of binary executables
PPREW '13 Proceedings of the 2nd ACM SIGPLAN Program Protection and Reverse Engineering Workshop
Fast, scalable detection of "Piggybacked" mobile applications
Proceedings of the third ACM conference on Data and application security and privacy
A similarity metric method of obfuscated malware using function-call graph
Journal in Computer Virology
Using file relationships in malware classification
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Juxtapp: a scalable system for detecting code reuse among android applications
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
A static, packer-agnostic filter to detect similar malware samples
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
iBinHunt: binary hunting with inter-procedural control flow
ICISC'12 Proceedings of the 15th international conference on Information Security and Cryptology
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
The impact of vendor customizations on android security
Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security
Structural detection of android malware using embedded call graphs
Proceedings of the 2013 ACM workshop on Artificial intelligence and security
SigMal: a static signal processing based malware triage
Proceedings of the 29th Annual Computer Security Applications Conference
Simseer and bugwise: web services for binary-level software similarity and defect detection
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Exploring discriminatory features for automated malware classification
DIMVA'13 Proceedings of the 10th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Systematic audit of third-party android phones
Proceedings of the 4th ACM conference on Data and application security and privacy
Hi-index | 0.00 |
A major challenge of the anti-virus (AV) industry is how to effectively process the huge influx of malware samples they receive every day. One possible solution to this problem is to quickly determine if a new malware sample is similar to any previously-seen malware program. In this paper, we design, implement and evaluate a malware database management system called SMIT (Symantec Malware Indexing Tree) that can efficiently make such determination based on malware's function-call graphs, which is a structural representation known to be less susceptible to instruction-level obfuscations commonly employed by malware writers to evade detection of AV software. Because each malware program is represented as a graph, the problem of searching for the most similar malware program in a database to a given malware sample is cast into a nearest-neighbor search problem in a graph database. To speed up this search, we have developed an efficient method to compute graph similarity that exploits structural and instruction-level information in the underlying malware programs, and a multi-resolution indexing scheme that uses a computationally economical feature vector for early pruning and resorts to a more accurate but computationally more expensive graph similarity function only when it needs to pinpoint the most similar neighbors. Results of a comprehensive performance study of the SMIT prototype using a database of more than 100,000 malware demonstrate the effective pruning power and scalability of its nearest neighbor search mechanisms.