Finding software license violations through binary code clone detection

  • Authors:
  • Armijn Hemel;Karl Trygve Kalleberg;Rob Vermaas;Eelco Dolstra

  • Affiliations:
  • gpl-violations.org, Netherlands, Netherlands;KolibriFX, Norway, Norway;Delft University of Technology, Netherlands, Netherlands;Delft University of Technology, Netherlands, Netherlands

  • Venue:
  • Proceedings of the 8th Working Conference on Mining Software Repositories
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Software released in binary form frequently uses third-party packages without respecting their licensing terms. For instance, many consumer devices have firmware containing the Linux kernel, without the suppliers following the requirements of the GNU General Public License. Such license violations are often accidental, e.g., when vendors receive binary code from their suppliers with no indication of its provenance. To help find such violations, we have developed the Binary Analysis Tool (BAT), a system for code clone detection in binaries. Given a binary, such as a firmware image, it attempts to detect cloning of code from repositories of packages in source and binary form. We evaluate and compare the effectiveness of three of BAT's clone detection techniques: scanning for string literals, detecting similarity through data compression, and detecting similarity by computing binary deltas.