Computer viruses: theory and experiments
Computers and Security
Machine Learning
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
An Abstract Theory of Computer Viruses
CRYPTO '88 Proceedings of the 8th Annual International Cryptology Conference on Advances in Cryptology
Data Mining Methods for Detection of New Malicious Executables
SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Learning to detect malicious executables in the wild
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Computer Viruses: from theory to applications (Collection IRIS)
Computer Viruses: from theory to applications (Collection IRIS)
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
A heuristically perturbation of dataset to achieve a diverse ensemble of classifiers
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
Unsupervised linkage learner based on local optimums
MCPR'12 Proceedings of the 4th Mexican conference on Pattern Recognition
A heuristic diversity production approach
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part III
A clustering ensemble based on a modified normalized mutual information metric
AMT'12 Proceedings of the 8th international conference on Active Media Technology
Hi-index | 0.00 |
N-grams are the basic features commonly used in sequence-based malicious code detection methods in computer virology research. The empirical results from previous works suggest that, while short length n-grams are easier to extract, the characteristics of the underlying executables are better represented in lengthier n-grams. However, by increasing the length of an n-gram, the feature space grows in an exponential manner and much space and computational resources are demanded. And therefore, feature selection has turned to be the most challenging step in establishing an accurate detection system based on byte n-grams. In this paper we propose an efficient feature extraction method where in order to gain more information; both adjacent and non-adjacent bigrams are used. Additionally, we present a novel boosting feature selection method based on genetic algorithm. Our experimental results indicate that the proposed detection system detects virus programs far more accurately than the best earlier known methods.