Clustering large data sets based on data compression technique and weighted quality measures

Authors:
M. Sassi;A. Grissa
Affiliations:
National School of Engineering of Tunis, TIC Department, Tunis, Tunisia;National School of Engineering of Tunis, TIC Department, Tunis, Tunisia
Venue:
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Year:
2009

Citing 8
Cited 0

Convergence theory for fuzzy c-means: counterexamples and repairs

IEEE Transactions on Systems, Man and Cybernetics
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Scalability for clustering algorithms revisited

ACM SIGKDD Explorations Newsletter
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Using Gaussians Functions to Determine Representative Clustering Prototypes

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Extending fuzzy and probabilistic clustering to very large data sets

Computational Statistics & Data Analysis
Complexity reduction for "large image" processing

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Various algorithms have been proposed for clustering large data sets for the hard and fuzzy case, not as much work has been done for automatic clustering approaches in which the number of clusters is unknown for the user. These approaches need some measures, called validity function to evaluate the clustering result and to give to the user the optimal number of clusters. In order to obtain this number, three conditions are necessary: (1) a good compression technique for data reduction with limited memory allocated, (b) good measures for the evaluation of the goodness of clusters for varying number of clusters, and (c) a good cluster algorithm that can automatically produce the number of clusters and takes into account the used compression technique. In this paper, we propose new clustering approaches which deals with new compression technique based on quality measures.