A fuzzy threshold based modified clustering algorithm for natural data exploration

Authors:
Binu Thomas;G. Raju
Affiliations:
Dept. of Computer Applications, Marian College, Kuttiikkanam, Kerala, India;Department of Information Technoogy, Kannur University, Kannur, Kerala, India
Venue:
PAISI'10 Proceedings of the 2010 Pacific Asia conference on Intelligence and Security Informatics
Year:
2010

Citing 7
Cited 1

Fuzzy sets, uncertainty, and information

Fuzzy sets, uncertainty, and information
Data mining and knowledge discovery in databases

Communications of the ACM
The data warehouse and data mining

Communications of the ACM
Classification with Degree of Membership: A Fuzzy Approach

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Mining Fuzzy Rules in A Donor Database for Direct Marketing by a Charitable Organization

ICCI '02 Proceedings of the 1st IEEE International Conference on Cognitive Informatics
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Data mining in soft computing framework: a survey

IEEE Transactions on Neural Networks

A novel unsupervised fuzzy clustering method for preprocessing of quantitative attributes in association rule mining

Information Technology and Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional supervised clustering methods require the user to provide the number of clusters before we start any data exploration. The data engineer also has to select the initial cluster seeds. In c-means clustering method, the performance efficiency of the algorithm depends mainly on the initial selection of number of clusters and cluster seeds. With the real world data, the initial selection of cluster count and centroids becomes a tedious task. In this paper we propose a modified clustering algorithm which works on the principles of fuzzy clustering. The method we propose is using a modified form of popular fuzzy c-means algorithm for membership calculation. The algorithm begins on the assumption that all the data points are initial centroids. . The clusters are continuously merged based on a threshold value until we get the optimum number of clusters. The algorithm is also capable of detecting the outliers The algorithm is tested with the data for Gross National Happiness (GNH) program of Bhutan and found to be highly efficient in segmenting natural data sets.