Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue applications

  • Authors:
  • Travis Atkison

  • Affiliations:
  • Mississippi State University, Starkville, MS

  • Venue:
  • Proceedings of the 47th Annual Southeast Regional Conference
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a research effort to improve the use of the cosine similarity information retrieval technique to detect unknown, known or variances of known rogue software by applying the feature extraction technique of randomized projection. Document similarity techniques, such as cosine similarity, have been used with great success in several document retrieval applications. By following a standard information retrieval methodology, software, in machine readable format, can be regarded as documents in the corpus. These "documents" may or may not have a known rogue functionality. The query is software, again in machine readable format, which contains a certain type of rogue software. This methodology provides an ability to search the corpus with a query and retrieve/identify potentially rogue software as well as other instances of the same type of vulnerability. This retrieval is based on the similarity of the query to a given document in the corpus. To overcome what is known as the 'the curse of dimensionality' that can occur with the use of this type of information retrieval technique, randomized projections are used to create a low-order embedding of the high-dimensional data. For our experiment, we obtain Microsoft Windows applications, infect a subset of them with several common Trojans and apply our dimensionality and prediction methodology. Preliminary results show promise when applying randomized projections to cosine similarity in both speed of prediction and efficiency of required space when compared with using only cosine similarity.