A sentence-matching method for automatic license identification of source code files

Authors:
Daniel M. German;Yuki Manabe;Katsuro Inoue
Affiliations:
University of Victoria, Victoria, BC, Canada;Osaka University, Osaka, Japan;Osaka University, Osaka, Japan
Venue:
Proceedings of the IEEE/ACM international conference on Automated software engineering
Year:
2010

Citing 17
Cited 7

A technique for isolating differences between files

Communications of the ACM
Characteristics of Open Source Projects

CSMR '03 Proceedings of the Seventh European Conference on Software Maintenance and Reengineering
Using Open Source Software in Product Development: A Primer

IEEE Software
Open Source Licensing: Software Freedom and Intellectual Property Law

Open Source Licensing: Software Freedom and Intellectual Property Law
Free/open source software development

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
The FOSSology project

Proceedings of the 2008 international working conference on Mining software repositories
Determinants of open source software project success: A longitudinal study

Decision Support Systems
Development with Off-the-Shelf Components: 10 Facts

IEEE Software
Impact of license choice on Open Source Software development activity

Journal of the American Society for Information Science and Technology
Macro-level software evolution: a case study of a large software compilation

Empirical Software Engineering
License integration patterns: Addressing license mismatches in component-based development

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Automated software license analysis

Automated Software Engineering
Analyzing software licenses in open architecture software systems

FLOSS '09 Proceedings of the 2009 ICSE Workshop on Emerging Trends in Free/Libre/Open Source Software Research and Development
Code siblings: Technical and legal implications of copying code between applications

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Who are Source Code Contributors and How do they Change?

WCRE '09 Proceedings of the 2009 16th Working Conference on Reverse Engineering
An exploratory study of the evolution of software licensing

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Understanding and Auditing the Licensing of Open Source Software Distributions

ICPC '10 Proceedings of the 2010 IEEE 18th International Conference on Program Comprehension

Evolutional analysis of licenses in FOSS

Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE)
Lawful software engineering

Proceedings of the FSE/SDP workshop on Future of software engineering research
Finding software license violations through binary code clone detection

Proceedings of the 8th Working Conference on Mining Software Repositories
Measuring subversions: security and legal risk in reused software artifacts

Proceedings of the 33rd International Conference on Software Engineering
Where does this code come from and where does it go? - integrated code history tracker for open source systems -

Proceedings of the 34th International Conference on Software Engineering
An approach to the formal analysis of license interoperability

Computers and Electrical Engineering
Software trustworthiness 2.0-A semantic web enabled global source code analysis approach

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is e for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS