Identification of optimal cluster centroid of multi-variable functions for clustering concept-drift categorical data

Authors:
K. Reddy Madhavi;A. Vinaya Babu;A. Anand Rao;S. V. N. Raju
Affiliations:
JNTUA, Ananthpur;JNTUHCE, Hyderabad;JNTUACE, Ananthpur;JNTUH, Hyderabad
Venue:
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Year:
2012

Citing 7
Cited 1

Knowledge discovery in databases terminology

Advances in knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Data mining: concepts and techniques

Data mining: concepts and techniques
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Automated Variable Weighting in k-Means Type Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Catching the Trend: A Framework for Clustering Concept-Drifting Categorical Data

IEEE Transactions on Knowledge and Data Engineering
Optimization and Improvement Based on K-Means Cluster Algorithm

KAM '09 Proceedings of the 2009 Second International Symposium on Knowledge Acquisition and Modeling - Volume 03

Graphical method to find optimal cluster centroid for two-variable linear functions of concept-drift categorical data

Proceedings of the Second International Conference on Computational Science, Engineering and Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identification of useful clusters in large datasets has attracted considerable interest in clustering process. Since data in the World Wide Web is increasing exponentially that affects on clustering accuracy and decision making, change in the concept between every cluster occurs named concept drift. This newly added time based data must be assigned/labeled into generated clusters at our hand. To say that the data labeling was performed well, the clusters must be efficient. Selecting initial cluster center (centroid) is the key factor that has high affection in generating effective clusters. The existing clustering methods selects centroid randomly. Different centroids results in different clusters. To avoid this random selection, we are proposing methods in selecting the centroid by analyzing the properties of data since the data with different properties exists in real world. Our previous work was concentrated in the identification centroid for the functions of single variable and two variable functions. This paper proposes methods in finding optimal cluster centroid for the multi-variable functions and then apply any existing clustering algorithm to generate clusters by using suitable distance measure.