Visualization of shared system call sequence relationships in large malware corpora

  • Authors:
  • Josh Saxe;David Mentis;Chris Greamo

  • Affiliations:
  • Invincea Labs;Invincea Labs;Invincea Labs

  • Venue:
  • Proceedings of the Ninth International Symposium on Visualization for Cyber Security
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel system for automatically discovering and interactively visualizing shared system call sequence relationships within large malware datasets. Our system's pipeline begins with the application of a novel heuristic algorithm for extracting variable length, semantically meaningful system call sequences from malware system call behavior logs. Then, based on the occurrence of these semantic sequences, we construct a Boolean vector representation of the malware sample corpus. Finally we compute Jaccard indices pairwise over sample vectors to obtain a sample similarity matrix. Our graphical user interface links two visualizations within an interactive display. Our first view is a map-like visualization of similarity among the samples based on a reduced dimensional projection of our similarity matrix. Our second view provides insight into similarities and differences between selected malware samples in terms of the system call sequences they share. We also provide a set of interactive filters based on malicious behavioral traits. The integration of these views into an interactive, linked display allows users to comprehend the overall similarity structure of a malware corpus, inspect how behavioral traits distribute over the corpus, and to drill in to inspect local similarities and differences between samples.