Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm

  • Authors:
  • M. Vijayakumar;S. Prakash;R. M. S. Parvathi

  • Affiliations:
  • Department of Computer Science and Engineering, Sasurie College of Engineering, Tamilnadu, India;Department of Information Technology, Sasurie College of Engineering, Tamilnadu, India;Department of Computer Science and Engineering, Sengunthar College of Engineering for Women, Tamilnadu, India

  • Venue:
  • WSEAS TRANSACTIONS on COMMUNICATIONS
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. Hierarchical clustering technique uses the structure and data values. The partition clustering technique uses the data similarity factors. Transactions are partitioned into small groups. K-means clustering algorithm is one of the widely used clustering algorithms. Local cluster accuracy is high in the K-means clustering algorithm. Inter cluster relationship is not concentrated in the K-means algorithm. K-means clustering algorithm requires the cluster count as the major input. The system chooses random transactions are initial centroid for each cluster. Cluster accuracy is associated with the initial centroid estimation process. The random transaction based centroid selection model may choose similar transactions. In this case the cluster accuracy is limited with respect to the distance between the centroid values. The proposed system is designed to improve the K-means clustering algorithm with efficient centroid estimation models. Three centroid estimation models are proposed system. They are random selection with distance management, mean distance model and inter cluster distance model. Cosine distance measure and Euclidean distance measure are used to estimate similarity between the transactions. Three centroid estimation models are tested with two distance measure schemes. Precision and recall and fitness measure are used to test the cluster accuracy levels. Java language and Oracle database are selected for the system development.