Online sharing and integration of results from mining software repositories

  • Authors:
  • Iman Keivanloo

  • Affiliations:
  • Concordia University, Canada

  • Venue:
  • Proceedings of the 34th International Conference on Software Engineering
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The mining of software repository involves the extraction of both basic and value-added information from existing software repositories. Depending on stakeholders (e.g., researchers, management), these repositories are mined several times for different application purposes. To avoid unnecessary pre-processing steps and improve productivity, sharing, and integration of extracted facts and results are needed. The motivation of this research is to introduce a novel collaborative sharing platform for software datasets that supports on-the-fly inter-datasets integration. We want to facilitate and promote a paradigm shift in the source code analysis domain, similar to the one by Wikipedia in the knowledge-sharing domain. In this paper, we present the SeCold project, which is the first online, publicly available software ecosystem Linked Data dataset. As part of this research, not only theoretical background on how to publish such datasets is provided, but also the actual dataset. SeCold contains about two billion facts, such as source code statements, software licenses, and code clones from over 18.000 software projects. SeCold is also an official member of the Linked Data cloud and one of the eight largest online Linked Data datasets available on the cloud.