Data stream mining for predicting software build outcomes using source code metrics

Authors:
Jacqui Finlay;Russel Pears;Andy M. Connor
Affiliations:
-;-;-
Venue:
Information and Software Technology
Year:
2014

Citing 38
Cited 0

A Validation of Object-Oriented Design Metrics as Quality Indicators

IEEE Transactions on Software Engineering
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Mining time-changing data streams

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Deriving models of software fault-proneness

SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Impact Analysis by Mining Software and Change Request Repositories

METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Mining metrics to predict component failures

Proceedings of the 28th international conference on Software engineering
Detecting similar Java classes using tree algorithms

Proceedings of the 2006 international workshop on Mining software repositories
An introduction to the WEKA data mining system

Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education
Automatic Generation of Detection Algorithms for Design Defects

ASE '06 Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering
Mining Software Engineering Data

ICSE COMPANION '07 Companion to the proceedings of the 29th International Conference on Software Engineering
How Long Will It Take to Fix This Bug?

MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Mining Software Evolution to Predict Refactoring

ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Mining software repositories for comprehensible software fault prediction models

Journal of Systems and Software
An empirical validation of a neural network model for software effort estimation

Expert Systems with Applications: An International Journal
A Model to Identify Refactoring Effort during Maintenance by Mining Source Code Repositories

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Does distance still matter?

Software Process: Improvement and Practice - Global Software Development: Where Are We Headed?
The Top Ten Algorithms in Data Mining

The Top Ten Algorithms in Data Mining
Predicting build failures using social network analysis on developer communication

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Handbook of Statistical Analysis and Data Mining Applications

Handbook of Statistical Analysis and Data Mining Applications
Mining the Jazz repository: Challenges and opportunities

MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process

Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Data Mining for Software Engineering

Computer
Adaptive learning and mining for data streams and frequent patterns

ACM SIGKDD Explorations Newsletter
Handling numeric attributes in hoeffding trees

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Analytics for software development

Proceedings of the FSE/SDP workshop on Future of software engineering research
Software intelligence: the future of mining software engineering data

Proceedings of the FSE/SDP workshop on Future of software engineering research
Data Mining: A Knowledge Discovery Approach

Data Mining: A Knowledge Discovery Approach
Choosing software metrics for defect prediction: an investigation on feature selection techniques

Software—Practice & Experience
By no means: a study on aggregating software metrics

Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Comparing fine-grained source code changes and code churn for bug prediction

Proceedings of the 8th Working Conference on Mining Software Repositories
Data mining in software engineering

Intelligent Data Analysis
Are change metrics good predictors for an evolving software product line?

Proceedings of the 7th International Conference on Predictive Models in Software Engineering
MOA: a real-time analytics open source framework

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Mining Software Metrics from Jazz

SERA '11 Proceedings of the 2011 Ninth International Conference on Software Engineering Research, Management and Applications
Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness

ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Identifying thresholds for object-oriented software metrics

Journal of Systems and Software
How much information do software metrics contain?

Proceedings of the 3rd ACM SIGPLAN workshop on Evaluation and usability of programming languages and tools
Detecting fraudulent personalities in networks of online auctioneers

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Context: Software development projects involve the use of a wide range of tools to produce a software artifact. Software repositories such as source control systems have become a focus for emergent research because they are a source of rich information regarding software development projects. The mining of such repositories is becoming increasingly common with a view to gaining a deeper understanding of the development process. Objective: This paper explores the concepts of representing a software development project as a process that results in the creation of a data stream. It also describes the extraction of metrics from the Jazz repository and the application of data stream mining techniques to identify useful metrics for predicting build success or failure. Method: This research is a systematic study using the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift by applying the Massive Online Analysis (MOA) tool. Results: The results indicate that only a relatively small number of the available measures considered have any significance for predicting the outcome of a build over time. These significant measures are identified and the implication of the results discussed, particularly the relative difficulty of being able to predict failed builds. The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches. Conclusion: Overall prediction accuracies of 75% have been achieved through the use of the Hoeffding Tree classification method. Despite this high overall accuracy, there is greater difficulty in predicting failure than success. The emergence of a stable classification tree is limited by the lack of data but overall the approach shows promise in terms of informing software development activities in order to minimize the chance of failure.