BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The handbook of brain theory and neural networks
Efficient progressive sampling
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Normalized Cuts and Image Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Pairwise Data Clustering by Deterministic Annealing
IEEE Transactions on Pattern Analysis and Machine Intelligence
A Min-max Cut Algorithm for Graph Partitioning and Data Clustering
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Document clustering based on non-negative matrix factorization
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Optimal Cluster Preserving Embedding of Nonmetric Proximity Data
IEEE Transactions on Pattern Analysis and Machine Intelligence
Spectral Grouping Using the Nyström Method
IEEE Transactions on Pattern Analysis and Machine Intelligence
Document Clustering Using Locality Preserving Indexing
IEEE Transactions on Knowledge and Data Engineering
Effective and Efficient Distributed Model-Based Clustering
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
SIAM Journal on Computing
Approximate clustering in very large relational data: Research Articles
International Journal of Intelligent Systems
A survey of kernel and spectral methods for clustering
Pattern Recognition
Selective sampling for approximate clustering of very large data sets
International Journal of Intelligent Systems
Distributed clustering based on sampling local density estimates
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Extending fuzzy and probabilistic clustering to very large data sets
Computational Statistics & Data Analysis
Learning and Matching of Dynamic Shape Manifolds for Human Action Recognition
IEEE Transactions on Image Processing
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Vector quantization based approximate spectral clustering of large datasets
Pattern Recognition
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval
Knowledge-Based Systems
Hi-index | 0.05 |
Pairwise clustering methods have shown great promise for many real-world applications. However, the computational demands of these methods make them impractical for use with large data sets. The contribution of this paper is a simple but efficient method, called eSPEC, that makes clustering feasible for problems involving large data sets. Our solution adopts a ''sampling, clustering plus extension'' strategy. The methodology starts by selecting a small number of representative samples from the relational pairwise data using a selective sampling scheme; then the chosen samples are grouped using a pairwise clustering algorithm combined with local scaling; and finally, the label assignments of the remaining instances in the data are extended as a classification problem in a low-dimensional space, which is explicitly learned from the labeled samples using a cluster-preserving graph embedding technique. Extensive experimental results on several synthetic and real-world data sets demonstrate both the feasibility of approximately clustering large data sets and acceleration of clustering in loadable data sets of our method.