Function matching-based binary-level software similarity calculation

  • Authors:
  • Yeo Reum Lee;BooJoong Kang;Eul Gyu Im

  • Affiliations:
  • Hanyang University Seoul, Korea;Hanyang University Seoul, Korea;Hanyang University Seoul, Korea

  • Venue:
  • Proceedings of the 2013 Research in Adaptive and Convergent Systems
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a method to calculate similarities of software without any source code information. The proposed method can be used for various applications such as detecting the source code theft and copyright infringement, as well as locating updated parts of software including malware. To determine the similarities of software, we used an approach that matches similar functions included in software. Our function-based matching process is composed of two steps. In step 1, the structural information of call graph in binary file is used to match functions, and the matched functions are not processed in step 2 to reduce the number of detailed matching. In step 2, by using instruction mnemonics, N-gram similarity-based matching is performed. Using the structural matching proposed in this paper, about 30% improvement in the matching performance is achieved with the four-tuple matching which also reduces the false positive rate compared to previous studies. Our other experimental results showed that, in comparison to source code-based approaches, our proposed method has only about 3% difference in similarity calculation with real software samples. Therefore, we argue that our proposed method makes a contribution in the field of binary-based software similarity calculation.