Classification of malware using structured control flow
AusPDC '10 Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing - Volume 107
A study of android application security
SEC'11 Proceedings of the 20th USENIX conference on Security
A survey of mobile malware in the wild
Proceedings of the 1st ACM workshop on Security and privacy in smartphones and mobile devices
Dissecting Android Malware: Characterization and Evolution
SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
DroidMat: Android Malware Detection through Manifest and API Calls Tracing
ASIAJCIS '12 Proceedings of the 2012 Seventh Asia Joint Conference on Information Security
On the feasibility of online malware detection with performance counters
Proceedings of the 40th Annual International Symposium on Computer Architecture
A New Android Malware Detection Approach Using Bayesian Classification
AINA '13 Proceedings of the 2013 IEEE 27th International Conference on Advanced Information Networking and Applications
A Classifier of Malicious Android Applications
ARES '13 Proceedings of the 2013 International Conference on Availability, Reliability and Security
Hi-index | 0.00 |
To address the issue of malware detection, researchers have recently started to investigate the capabilities of machine-learning techniques for proposing effective approaches. Several promising results were recorded in the literature, many approaches being assessed with the common "10-Fold cross validation" scheme. This paper revisits the purpose of malware detection to discuss the adequacy of the "10-Fold" scheme for validating techniques that may not perform well in reality. To this end, we have devised several Machine Learning classifiers that rely on a novel set of features built from applications' CFGs. We use a sizeable dataset of over 50,000 Android applications collected from sources where state-of-the art approaches have selected their data. We show that our approach outperforms existing machine learning-based approaches. However, this high performance on usual-size datasets does not translate in high performance in the wild.