How to build repeatable experiments

Authors:
Gregory Gay;Tim Menzies;Bojan Cukic;Burak Turhan
Affiliations:
WVU, Morgantown, WVU;WVU, Morgantown, WVU;WVU, Morgantown, WVU;NRC Institute for Information Technology, Ottawa, Canada
Venue:
PROMISE '09 Proceedings of the 5th International Conference on Predictor Models in Software Engineering
Year:
2009

Citing 7
Cited 1

Visual and linguistic information in gesture classification

Proceedings of the 6th international conference on Multimodal interfaces
YALE: rapid prototyping for complex data mining tasks

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining

Data Mining
Data Mining Static Code Attributes to Learn Defect Predictors

IEEE Transactions on Software Engineering
Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings

IEEE Transactions on Software Engineering
On the relative value of cross-company and within-company data for defect prediction

Empirical Software Engineering
Locally weighted naive bayes

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Replication of defect prediction studies: problems, pitfalls and recommendations

Proceedings of the 6th International Conference on Predictive Models in Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

The mantra of the PROMISE series is "repeatable, improvable, maybe refutable" software engineering experiments. This community has successfully created a library of reusable software engineering data sets. The next challenge in the PROMISE community will be to not only share data, but to share experiments. Our experience with existing data mining environments is that these tools are not suitable for publishing or sharing repeatable experiments. OURMINE is an environment for the development of data mining experiments. OURMINE offers a succinct notation for describing experiments. Adding new tools to OURMINE, in a variety of languages, is a rapid and simple process. This makes it a useful research tool. Complicated graphical interfaces have been eschewed for simple command-line prompts. This simplifies the learning curve for data mining novices. The simplicity also encourages large scale modification and experimentation with the code. In this paper, we show the OURMINE code required to reproduce a recent experiment checking how defect predictors learned from one site apply to another. This is an important result for the PROMISE community since it shows that our shared repository is not just a useful academic resource. Rather, it is a valuable resource industry: companies that lack the local data required to build those predictors can use PROMISE data to build defect predictors.