Inter cluster distance management model with optimal centroid estimation for K-means clustering algorithm

Authors:
M. Vijayakumar;S. Prakash;R. M. S. Parvathi
Affiliations:
Department of Computer Science and Engineering, Sasurie College of Engineering, Tamilnadu, India;Department of Information Technology, Sasurie College of Engineering, Tamilnadu, India;Department of Computer Science and Engineering, Sengunthar College of Engineering for Women, Tamilnadu, India
Venue:
WSEAS TRANSACTIONS on COMMUNICATIONS
Year:
2011

Citing 10
Cited 0

A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Redefining Clustering for High-Dimensional Applications

IEEE Transactions on Knowledge and Data Engineering
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Divisive Correlation Clustering Algorithm (DCCA) for grouping of genes

Bioinformatics
An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis

IEEE Transactions on Knowledge and Data Engineering
Continuous Clustering of Moving Objects

IEEE Transactions on Knowledge and Data Engineering
K-means Clustering Algorithm with Improved Initial Center

WKDD '09 Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining
Improved K-Means Algorithm and Application in Customer Segmentation

APWCS '10 Proceedings of the 2010 Asia-Pacific Conference on Wearable Computing Systems
Clustering with a genetically optimized approach

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering techniques are used to group up the transactions based on the relevancy. Cluster analysis is one of the primary data analysis method. The clustering process can be done in two ways such that Hierarchical clusters and partition clustering. Hierarchical clustering technique uses the structure and data values. The partition clustering technique uses the data similarity factors. Transactions are partitioned into small groups. K-means clustering algorithm is one of the widely used clustering algorithms. Local cluster accuracy is high in the K-means clustering algorithm. Inter cluster relationship is not concentrated in the K-means algorithm. K-means clustering algorithm requires the cluster count as the major input. The system chooses random transactions are initial centroid for each cluster. Cluster accuracy is associated with the initial centroid estimation process. The random transaction based centroid selection model may choose similar transactions. In this case the cluster accuracy is limited with respect to the distance between the centroid values. The proposed system is designed to improve the K-means clustering algorithm with efficient centroid estimation models. Three centroid estimation models are proposed system. They are random selection with distance management, mean distance model and inter cluster distance model. Cosine distance measure and Euclidean distance measure are used to estimate similarity between the transactions. Three centroid estimation models are tested with two distance measure schemes. Precision and recall and fitness measure are used to test the cluster accuracy levels. Java language and Oracle database are selected for the system development.