An unabridged source code dataset for research in software reuse

  • Authors:
  • Werner Janjic;Oliver Hummel;Marcus Schumacher;Colin Atkinson

  • Affiliations:
  • University of Mannheim, Germany;KIT, Germany;University of Mannheim, Germany;University of Mannheim, Germany

  • Venue:
  • Proceedings of the 10th Working Conference on Mining Software Repositories
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a large, unabridged data-set of Java source code gathered and shared as part of the Merobase Component Finder project of the Software-Engineering Group at the University of Mannheim. It consists of the complete index used to drive the search engine, www.merobase.com, the vast majority of the source code modules accessible through it, and a tool that enables researchers to efficiently browse the collected data. We describe the techniques used to collect, format and store the data set, as well as the core capabilities of the Merobase search engine such as classic keyword-based, interface-based and test-driven search. This data-set, which represents one of the largest searchable collections of source and binary modules available online, has been recently made available for download and use in further research projects. All files are available at http://merobase.informatik.uni-mannheim.de/sources/