Seed-detective: a novel clustering technique using high quality seed for K-means on categorical and numerical attributes

Authors:
Anisur Rahman;Zahidul Islam
Affiliations:
Charles Sturt University, Wagga Wagga, Australia;Charles Sturt University, Wagga Wagga, Australia
Venue:
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Year:
2011

Citing 7
Cited 1

C4.5: programs for machine learning

C4.5: programs for machine learning
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Clustering for Monitoring Software Systems Maintainability Evolution

Electronic Notes in Theoretical Computer Science (ENTCS)
Improved use of continuous attributes in C4.5

Journal of Artificial Intelligence Research
Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems

Information and Software Technology
EXPLORE: a novel decision tree classification algorithm

BNCOD'10 Proceedings of the 27th British national conference on Data Security and Security Data

CRUDAW: a novel fuzzy technique for clustering records following user defined attribute weights

AusDM '12 Proceedings of the Tenth Australasian Data Mining Conference - Volume 134

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a novel clustering technique called Seed-Detective. It is a combination of modified versions of two existing techniques namely Ex-Detective and Simple K-Means. Seed-Detective first discovers a set of preliminary clusters using our modified Ex-Detective. The modified Ex-Detective allows a data miner to assign different weights (importance levels) for all attributes, both numerical and categorical. Centers of the preliminary clusters are then considered as initial seeds for the modified Simple K-Means, which unlike existing Simple K-Means does not randomly select the initial seeds. Centers of the preliminary clusters are naturally expected to be better quality seeds than the seeds that are chosen randomly. Having better quality initial seeds as input the modified Simple K-Means is expected to produce better quality clusters. We compare Seed-Detective with several existing techniques including Ex-Detective, Simple K-Means, Basic Farthest Point Heuristic (BFPH) and New Farthest Point Heuristic (NFPH) on two publicly available natural data sets. BFPH and NFPH were shown in the literature to be better than Simple K-Means. However, our initial experimental results indicate that Seed-Detective produces better clusters than other techniques, based on several evaluation criteria including F-measure, entropy and purity. Another contribution of this paper is the experimental result on Ex-Detective which was never tested before.