Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
Modern Information Retrieval
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Solving cluster ensemble problems by bipartite graph partitioning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Clustering Ensembles: Models of Consensus and Weak Partitions
IEEE Transactions on Pattern Analysis and Machine Intelligence
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data
IEEE Transactions on Knowledge and Data Engineering
Adaptive dimension reduction using discriminant analysis and K-means clustering
Proceedings of the 24th international conference on Machine learning
IMDS: intelligent malware detection system
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Constrained Clustering: Advances in Algorithms, Theory, and Applications
Constrained Clustering: Advances in Algorithms, Theory, and Applications
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Solving Consensus and Semi-supervised Clustering Problems Using Nonnegative Matrix Factorization
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Adaptive cluster ensemble selection
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Generalized cluster aggregation
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Evaluation of malware clustering based on its dynamic behaviour
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Supervised learning for provenance-similarity of binaries
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 4th ACM workshop on Security and artificial intelligence
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Malware classification method via binary content comparison
Proceedings of the 2012 ACM Research in Applied Computation Symposium
Malware characterization using behavioral components
MMM-ACNS'12 Proceedings of the 6th international conference on Mathematical Methods, Models and Architectures for Computer Network Security: computer network security
Using low-level dynamic attributes for malware detection based on data mining methods
MMM-ACNS'12 Proceedings of the 6th international conference on Mathematical Methods, Models and Architectures for Computer Network Security: computer network security
A comparative study of malware family classification
ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Review: Classification of malware based on integrated static and dynamic features
Journal of Network and Computer Applications
DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles
Proceedings of the 29th Annual Computer Security Applications Conference
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Hi-index | 0.00 |
In this paper, resting on the analysis of instruction frequency and function-based instruction sequences, we develop an Automatic Malware Categorization System (AMCS) for automatically grouping malware samples into families that share some common characteristics using a cluster ensemble by aggregating the clustering solutions generated by different base clustering algorithms. We propose a principled cluster ensemble framework for combining individual clustering solutions based on the consensus partition. The domain knowledge in the form of sample-level constraints can be naturally incorporated in the ensemble framework. In addition, to account for the characteristics of feature representations, we propose a hybrid hierarchical clustering algorithm which combines the merits of hierarchical clustering and k-medoids algorithms and a weighted subspace K-medoids algorithm to generate base clusterings. The categorization results of our AMCS system can be used to generate signatures for malware families that are useful for malware detection. The case studies on large and real daily malware collection from Kingsoft Anti-Virus Lab demonstrate the effectiveness and efficiency of our AMCS system.