A Validation of Object-Oriented Design Metrics as Quality Indicators
IEEE Transactions on Software Engineering
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Deriving models of software fault-proneness
SEKE '02 Proceedings of the 14th international conference on Software engineering and knowledge engineering
Impact Analysis by Mining Software and Change Request Repositories
METRICS '05 Proceedings of the 11th IEEE International Software Metrics Symposium
Mining metrics to predict component failures
Proceedings of the 28th international conference on Software engineering
Detecting similar Java classes using tree algorithms
Proceedings of the 2006 international workshop on Mining software repositories
An introduction to the WEKA data mining system
Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education
Automatic Generation of Detection Algorithms for Design Defects
ASE '06 Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering
Mining Software Engineering Data
ICSE COMPANION '07 Companion to the proceedings of the 29th International Conference on Software Engineering
How Long Will It Take to Fix This Bug?
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
Mining Software Evolution to Predict Refactoring
ESEM '07 Proceedings of the First International Symposium on Empirical Software Engineering and Measurement
Mining software repositories for comprehensible software fault prediction models
Journal of Systems and Software
An empirical validation of a neural network model for software effort estimation
Expert Systems with Applications: An International Journal
A Model to Identify Refactoring Effort during Maintenance by Mining Source Code Repositories
PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Software Process: Improvement and Practice - Global Software Development: Where Are We Headed?
The Top Ten Algorithms in Data Mining
The Top Ten Algorithms in Data Mining
Predicting build failures using social network analysis on developer communication
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Handbook of Statistical Analysis and Data Mining Applications
Handbook of Statistical Analysis and Data Mining Applications
Mining the Jazz repository: Challenges and opportunities
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Cross-project defect prediction: a large scale experiment on data vs. domain vs. process
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Data Mining for Software Engineering
Computer
Adaptive learning and mining for data streams and frequent patterns
ACM SIGKDD Explorations Newsletter
Handling numeric attributes in hoeffding trees
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Analytics for software development
Proceedings of the FSE/SDP workshop on Future of software engineering research
Software intelligence: the future of mining software engineering data
Proceedings of the FSE/SDP workshop on Future of software engineering research
Data Mining: A Knowledge Discovery Approach
Data Mining: A Knowledge Discovery Approach
Choosing software metrics for defect prediction: an investigation on feature selection techniques
Software—Practice & Experience
By no means: a study on aggregating software metrics
Proceedings of the 2nd International Workshop on Emerging Trends in Software Metrics
Comparing fine-grained source code changes and code churn for bug prediction
Proceedings of the 8th Working Conference on Mining Software Repositories
Data mining in software engineering
Intelligent Data Analysis
Are change metrics good predictors for an evolving software product line?
Proceedings of the 7th International Conference on Predictive Models in Software Engineering
MOA: a real-time analytics open source framework
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Mining Software Metrics from Jazz
SERA '11 Proceedings of the 2011 Ninth International Conference on Software Engineering Research, Management and Applications
Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness
ESEM '11 Proceedings of the 2011 International Symposium on Empirical Software Engineering and Measurement
Identifying thresholds for object-oriented software metrics
Journal of Systems and Software
How much information do software metrics contain?
Proceedings of the 3rd ACM SIGPLAN workshop on Evaluation and usability of programming languages and tools
Detecting fraudulent personalities in networks of online auctioneers
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
Context: Software development projects involve the use of a wide range of tools to produce a software artifact. Software repositories such as source control systems have become a focus for emergent research because they are a source of rich information regarding software development projects. The mining of such repositories is becoming increasingly common with a view to gaining a deeper understanding of the development process. Objective: This paper explores the concepts of representing a software development project as a process that results in the creation of a data stream. It also describes the extraction of metrics from the Jazz repository and the application of data stream mining techniques to identify useful metrics for predicting build success or failure. Method: This research is a systematic study using the Hoeffding Tree classification method used in conjunction with the Adaptive Sliding Window (ADWIN) method for detecting concept drift by applying the Massive Online Analysis (MOA) tool. Results: The results indicate that only a relatively small number of the available measures considered have any significance for predicting the outcome of a build over time. These significant measures are identified and the implication of the results discussed, particularly the relative difficulty of being able to predict failed builds. The Hoeffding Tree approach is shown to produce a more stable and robust model than traditional data mining approaches. Conclusion: Overall prediction accuracies of 75% have been achieved through the use of the Hoeffding Tree classification method. Despite this high overall accuracy, there is greater difficulty in predicting failure than success. The emergence of a stable classification tree is limited by the lack of data but overall the approach shows promise in terms of informing software development activities in order to minimize the chance of failure.