Application of K-Medoids with Kd-Tree for Software Fault Prediction

Authors:
P. S. Bishnu;V. Bhattacherjee
Affiliations:
Birla Institute of Technology, Ranchi, India;Birla Institute of Technology, Ranchi, India
Venue:
ACM SIGSOFT Software Engineering Notes
Year:
2011

Citing 13
Cited 0

Comparing case-based reasoning classifiers for predicting high risk software components

Journal of Systems and Software
A Decision Criterion for the Optimal Number of Clusters in Hierarchical Clustering

Journal of Global Optimization
Software Quality Classification Modeling Using The SPRINT Decision Tree Algorithm

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Analyzing Software Measurement Data with Clustering Techniques

IEEE Intelligent Systems
Detection Strategies: Metrics-Based Rules for Detecting Design Flaws

ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Analyzing Software Quality with Limited Fault-Proneness Defect Data

HASE '05 Proceedings of the Ninth IEEE International Symposium on High-Assurance Systems Engineering
Maxdiff kd-trees for data condensation

Pattern Recognition Letters
A method for initialising the K-means clustering algorithm using kd-trees

Pattern Recognition Letters
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
Efficient Bisecting k-Medoids and Its Application in Gene Expression Analysis

ICIAR '08 Proceedings of the 5th international conference on Image Analysis and Recognition
Clustering and Metrics Thresholds Based Software Fault Prediction of Unlabeled Program Modules

ITNG '09 Proceedings of the 2009 Sixth International Conference on Information Technology: New Generations
Unsupervised learning for expert-based software quality estimation

HASE'04 Proceedings of the Eighth IEEE international conference on High assurance systems engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software fault prediction area is subject to problems like non availability of fault data which makes the application of supervised techniques difficult. In such cases unsupervised approaches like clustering are helpful. In this paper, K-Medoids clustering approach has been applied for software fault prediction. To overcome the inherent computational complexity of KMedoids algorithm a data structure called Kd-Tree has been used to identify data agents in the datasets. Partitioning Around Medoids is applied on these data agents and this results in a set of medoids. All the remaining data points are assigned to the nearest medoids thus obtained to get the final clusters. Software fault prediction error analysis results show that our approach outperforms all unsupervised approaches in the case of one given real dataset and gives best values for the evaluation parameters. For other real datasets, our results are comparable to other techniques. Performance evaluation of our technique with other techniques has been done. Results show that our technique reduces the total number of distance calculations drastically since the number of data agents is much less than the number of data points.