Quality Scheme Assessment in the Clustering Process

Authors:
Maria Halkidi;Michalis Vazirgiannis;Yannis Batistakis
Affiliations:
-;-;-
Venue:
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2000

Citing 11
Cited 20

Unsupervised Optimal Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Validity Measure for Fuzzy Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Validating fuzzy partitions obtained through c-shells clustering

Pattern Recognition Letters - Special issue on fuzzy set technology in pattern recognition
Data mining and knowledge discovery in databases

Communications of the ACM
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new cluster validity index for the fuzzy c-mean

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
Data Mining Techniques: For Marketing, Sales, and Customer Support

Data Mining Techniques: For Marketing, Sales, and Customer Support
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A Classification and Relationship Extraction Scheme for Raltional Databases Based on Fuzzy Logic

PAKDD '98 Proceedings of the Second Pacific-Asia Conference on Research and Development in Knowledge Discovery and Data Mining

UMiner: A Data Mining System Handling Uncertainty and Quality

EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
A Data Set Oriented Approach for Clustering Algorithm Selection

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
On Data Clustering Analysis: Scalability, Constraints, and Validation

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
An efficient preprocessing stage for the relationship-based clustering framework

Intelligent Data Analysis
Genetic-based minimum classification error mapping for accurate identifying Peer-to-Peer applications in the internet traffic

Expert Systems with Applications: An International Journal
Best clustering configuration metrics: towards multiagent based clustering

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
A comparison of internal and external cluster validation indexes

AMERICAN-MATH'11/CEA'11 Proceedings of the 2011 American conference on applied mathematics and the 5th WSEAS international conference on Computer engineering and applications
Band correction in random amplified polymorphism DNA images using hybrid genetic algorithms with multilevel thresholding

IWINAC'11 Proceedings of the 4th international conference on Interplay between natural and artificial computation: new challenges on bioinspired applications - Volume Part II
MOSCFRA: a multi-objective genetic approach for simultaneous clustering and gene ranking

CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
From alternative clustering to robust clustering and its application to gene expression data

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Multi-objective genetic algorithm based clustering approach and its application to gene expression data

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Improving linear discriminant analysis with artificial immune system-based evolutionary algorithms

Information Sciences: an International Journal
Improved response modeling based on clustering, under-sampling, and ensemble

Expert Systems with Applications: An International Journal
On clustering performance indices for multispectral images

KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part III
Construction and analysis of vector space models for use in aspect mining

Proceedings of the 50th Annual Southeast Regional Conference
A random indexing approach for web user clustering and web prefetching

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
RSQRT: An heuristic for estimating the number of clusters to report

Electronic Commerce Research and Applications
Biologically-inspired clustering of semantic Web services. Birds or ants intelligence?

Concurrency and Computation: Practice & Experience
Automatic segmentation of dermoscopy images using self-generating neural networks seeded by genetic algorithm

Pattern Recognition
Usage Profiles: A Process for Discovering Usage Patterns over Web Services and its Application to Service Evolution

International Journal of Web Services Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is mostly an unsupervised procedure and most of the clustering algorithms depend on assumptions and initial guesses in order to define the subgroups presented in a data set. As a consequence, in most applications the final clusters require some sort of evaluation. The evaluation procedure has to tackle difficult problems, which can be qualitatively expressed as: i. quality of clusters, ii. the degree with which a clustering scheme fits a specific data set, iii. the optimal number of clusters in a partitioning. In this paper we present a scheme for finding the optimal partitioning of a data set during the clustering process regardless of the clustering algorithm used. More specifically, we present an approach for evaluation of clustering schemes (partitions) so as to find the best number of clusters, which occurs in a specific data set. A clustering algorithm produces different partitions for different values of the input parameters. The proposed approach selects the best clustering scheme (i.e., the scheme with the most compact and well-separated clusters), according to a quality index we define. We verified our approach using two popular clustering algorithms on synthetic and real data sets in order to evaluate its reliability. Moreover, we study the influence of different clustering parameters to the proposed quality index.