Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Information Retrieval
On Clustering Validation Techniques
Journal of Intelligent Information Systems
The SMART Retrieval System—Experiments in Automatic Document Processing
The SMART Retrieval System—Experiments in Automatic Document Processing
Clustering analysis for data samples with multiple labels
DBA'06 Proceedings of the 24th IASTED international conference on Database and applications
Semantic classification with distributional kernels
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Novel labeling strategies for hierarchical representation of multidimensional data analysis results
AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
Feature-based cluster validation for high-dimensional data
AIA '08 Proceedings of the 26th IASTED International Conference on Artificial Intelligence and Applications
IEEE Transactions on Pattern Analysis and Machine Intelligence
Classifying French verbs using French and English lexical resources
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Hi-index | 0.00 |
Traditional quality indexes (Inertia, DB, …) are known to be method-dependent indexes that do not allow to properly estimate the quality of the clustering in several cases, as in that one of complex data, like textual data. We thus propose an alternative approach for clustering quality evaluation based on unsupervised measures of Recall, Precision and F-measure exploiting the descriptors of the data associated with the obtained clusters. Two categories of index are proposed, that are Macro and Micro indexes. This paper also focuses on the construction of a new cumulative Micro precision index that makes it possible to evaluate the overall quality of a clustering result while clearly distinguishing between homogeneous and heterogeneous, or degenerated results. The experimental comparison of the behavior of the classical indexes with our new approach is performed on a polythematic dataset of bibliographical references issued from the PASCAL database.