In-execution dynamic malware analysis and detection by mining information in process control blocks of Linux OS

  • Authors:
  • Farrukh Shahzad;M. Shahzad;Muddassar Farooq

  • Affiliations:
  • Next Generation Intelligent Networks Research Center (nexGIN RC), FAST National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), FAST National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan;Next Generation Intelligent Networks Research Center (nexGIN RC), FAST National University of Computer and Emerging Sciences (FAST-NU), Islamabad, Pakistan

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.07

Visualization

Abstract

Run-time behavior of processes - running on an end-host - is being actively used to dynamically detect malware. Most of these detection schemes build model of run-time behavior of a process on the basis of its data flow and/or sequence of system calls. These novel techniques have shown promising results but an efficient and effective technique must meet the following performance metrics: (1) high detection accuracy, (2) low false alarm rate, (3) small detection time, and (4) the technique should be resilient to run-time evasion attempts. To meet these challenges, a novel concept of genetic footprint is proposed, by mining the information in the kernel process control blocks (PCB) of a process, that can be used to detect malicious processes at run time. The genetic footprint consists of selected parameters - maintained inside the PCB of a kernel for each running process - that define the semantics and behavior of an executing process. A systematic forensic study of the execution traces of benign and malware processes is performed to identify discriminatory parameters of a PCB (task_struct is PCB in case of Linux OS). As a result, 16 out of 118 task structure parameters are short listed using the time series analysis. A statistical analysis is done to corroborate the features of the genetic footprint and to select suitable machine learning classifiers to detect malware. The scheme has been evaluated on a dataset that consists of 105 benign processes and 114 recently collected malware processes for Linux. The results of experiments show that the presented scheme achieves a detection accuracy of 96% with 0% false alarm rate in less than 100ms of the start of a malicious activity. Last but not least, the presented technique utilizes partial knowledge that is available at a given time while the process is still executing; as a result, the kernel of OS can devise mitigation strategies. It is also shown that the presented technique is robust to well known run-time evasion attempts.