A sentence-matching method for automatic license identification of source code files

  • Authors:
  • Daniel M. German;Yuki Manabe;Katsuro Inoue

  • Affiliations:
  • University of Victoria, Victoria, BC, Canada;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan

  • Venue:
  • Proceedings of the IEEE/ACM international conference on Automated software engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is e for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS