Applied multivariate techniques
Applied multivariate techniques
Advances in knowledge discovery and data mining
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A new cluster validity index for the fuzzy c-mean
Pattern Recognition Letters
ACM Computing Surveys (CSUR)
Data Mining Techniques: For Marketing, Sales, and Customer Support
Data Mining Techniques: For Marketing, Sales, and Customer Support
Quality Scheme Assessment in the Clustering Process
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Incremental Clustering for Mining in a Data Warehousing Environment
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
ROCK: A Robust Clustering Algorithm for Categorical Attributes
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Solving the missing node problem using structure and attribute information
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Hi-index | 0.00 |
In the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. Thus, a variety of algorithms have been proposed which have application in different fields and may result in different partitioning of a data set, depending on the specific clustering criterion used. Moreover, since clustering is an unsupervised process, most of the algorithms are based on assumptions in order to define a partitioning of a data set. It is then obvious that in most applications the final clustering scheme requires some sort of evaluation. In this paper we present a clustering validity procedure, which taking in account the inherent features of a data set evaluates the results of different clustering algorithms applied to it. A validity index, S_Dbw, is defined according to well-known clustering criteria so as to enable the selection of the algorithm providing the best partitioning of a data set. We evaluate the reliability of our approach both theoretically and experimentally, considering three representative clustering algorithms ran on synthetic and real data sets. It performed favorably in all studies, giving an indication of the algorithm that is suitable for the considered application.