CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

Authors:
Md Anisur Rahman;Md Zahidul Islam
Affiliations:
Charles Sturt University, Bathurst, NSW, Australia;Charles Sturt University, Bathurst, NSW, Australia
Venue:
AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134
Year:
2012

Citing 31
Cited 0

Statistics: principles and methods

Statistics: principles and methods
Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
C4.5: programs for machine learning

C4.5: programs for machine learning
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CACTUS—clustering categorical data using summaries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Grouper: a dynamic clustering interface to Web search results

WWW '99 Proceedings of the eighth international conference on World Wide Web
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Applications of clustering techniques to software partitioning, recovery and restructuring

Journal of Systems and Software - Special issue: Applications of statistics in software engineering
Fuzzy clustering of categorical data using fuzzy centroids

Pattern Recognition Letters
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

Bioinformatics
A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set

Pattern Recognition Letters
A method for initialising the K-means clustering algorithm using kd-trees

Pattern Recognition Letters
Clustering algorithms for categorical data

Clustering algorithms for categorical data
A k-mean clustering algorithm for mixed numeric and categorical data

Data & Knowledge Engineering
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters

IEEE Transactions on Knowledge and Data Engineering
Robust partitional clustering by outlier and density insensitive seeding

Pattern Recognition Letters
Enhanced bisecting k-means clustering using intermediate cooperation

Pattern Recognition
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
The fuzzy C-means algorithm with fuzzy P-mode prototypes for clustering objects having mixed features

Fuzzy Sets and Systems
Hierarchical density-based clustering of categorical data and a simplification

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Editorial: New fuzzy c-means clustering model based on the data weighted approach

Data & Knowledge Engineering
A new multi-objective technique for differential fuzzy clustering

Applied Soft Computing
A fuzzy c-means-type algorithm for clustering of data with mixed numeric and categorical attributes employing a probabilistic dissimilarity functional

Expert Systems with Applications: An International Journal
An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data

Knowledge-Based Systems
A new clustering method and its application in social networks

Pattern Recognition Letters
A fuzzy k-prototype clustering algorithm for mixed numeric and categorical data

Knowledge-Based Systems
EXPLORE: a novel decision tree classification algorithm

BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data
Seed-detective: a novel clustering technique using high quality seed for K-means on categorical and numerical attributes

AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel fuzzy clustering technique called CRUDAW that allows a data miner to assign weights on the attributes of a data set based on their importance (to the data miner) for clustering. The technique uses a novel approach to select initial seeds deterministically (not randomly) using the density of the records of a data set. CRUDAW also selects the initial fuzzy membership degrees deterministically. Moreover, it uses a novel approach for measuring distance considering the user defined weights of the attributes. While measuring the distance between the values of a categorical attribute the technique takes the similarity of the values into consideration instead of considering the distance to be either 0 or 1. Complete algorithm for CRUDAW is presented in the paper. We experimentally compare our technique with a few existing techniques -- namely SABC, GFCM, and KL-FCM-GM based on various evaluation criteria called Silhouette coefficient, F-measure, purity and entropy. We also use t-test, confidence interval test and time complexity in evaluating the performance of our technique. Four data sets available from UCI machine learning repository are used in the experiments. Our experimental results indicate that CRUDAW performs significantly better than the existing techniques in producing high quality clusters.