Locating executable fragments with Concordia, a scalable, semantics-based architecture

Authors:
Jason M. Carter
Affiliations:
Cyber Warfare Research Team, Oak Ridge National Laboratory, Oak Ridge, TN
Venue:
Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
Year:
2013

Citing 11
Cited 0

On Intelligence

On Intelligence
K-gram based software birthmarks

Proceedings of the 2005 ACM symposium on Applied computing
A Software Birthmark Based on Dynamic Opcode n-gram

ICSC '07 Proceedings of the International Conference on Semantic Computing
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Algorithms in Bioinformatics: A Practical Introduction

Algorithms in Bioinformatics: A Practical Introduction
Computing the behavior of malicious code with function extraction technology

Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
File Fragment Classification-The Case for Specialized Approaches

SADFE '09 Proceedings of the 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering
Concurrent Architecture for Automated Malware Classification

HICSS '10 Proceedings of the 2010 43rd Hawaii International Conference on System Sciences
Differentiating code from data in x86 binaries

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
BitShred: feature hashing malware for scalable triage and semantic analysis

Proceedings of the 18th ACM conference on Computer and communications security
Statistical Learning for File-Type Identification

ICMLA '11 Proceedings of the 2011 10th International Conference on Machine Learning and Applications and Workshops - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

The amount of digital evidence that must be processed by forensic tools and analysts is growing rapidly. This makes automated analysis a critical activity; an activity where continuous improvement is crucial. Concordia is a platform for investigating code semantics. One of Concordia's functions is identification of unknown code fragments; attempting to elucidate the possible objectives and origination of this type of evidence is our ultimate goal. Here we provide a synopsis of a method that identifies and locates code fragments using n-gram and semantics-based features and a k nearest neighbors classifier. Our objective is to identify a set of candidate files that may contain the unknown and supply additional details to isolate it within this set. To accomplish this task, Concordia uses the MapReduce model to process a large set of invariants to provide forensic experts a more efficient and automated way to produce solid intelligence about a growing body of evidence.