Long distance bigram models applied to word clustering
Pattern Recognition
A Very Fast Method for Clustering Big Text Datasets
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Clustered Nyström method for large scale manifold learning and dimension reduction
IEEE Transactions on Neural Networks
On a strategy for spectral clustering with parallel computation
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Large-scale cross-document coreference using distributed inference and hierarchical models
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
A new anticorrelation-based spectral clustering formulation
ACIVS'11 Proceedings of the 13th international conference on Advanced concepts for intelligent vision systems
Leveraging social media networks for classification
Data Mining and Knowledge Discovery
SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices
Data & Knowledge Engineering
Vector quantization based approximate spectral clustering of large datasets
Pattern Recognition
A conversation with Dr. Edward Y. Chang
ACM SIGKDD Explorations Newsletter
Fast nonnegative matrix tri-factorization for large-scale data co-clustering
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Automatic taxonomy construction from keywords
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparative study of efficient initialization methods for the k-means clustering algorithm
Expert Systems with Applications: An International Journal
Maximum margin clustering on evolutionary data
Proceedings of the 21st ACM international conference on Information and knowledge management
Constraint projections for semi-supervised affinity propagation
Knowledge-Based Systems
Relational co-clustering via manifold ensemble learning
Proceedings of the 21st ACM international conference on Information and knowledge management
ClusterFA: a memory-efficient DFA structure for network intrusion detection
Proceedings of the 7th ACM Symposium on Information, Computer and Communications Security
p-PIC: Parallel power iteration clustering for big data
Journal of Parallel and Distributed Computing
MicroClAn: Microarray clustering analysis
Journal of Parallel and Distributed Computing
Interpreting pedestrian behaviour by visualising and clustering movement data
W2GIS'13 Proceedings of the 12th international conference on Web and Wireless Geographical Information Systems
Fast global k-means clustering based on local geometrical information
Information Sciences: an International Journal
Biomedical time series clustering based on non-negative sparse coding and probabilistic topic model
Computer Methods and Programs in Biomedicine
Robust tensor clustering with non-greedy maximization
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Large-scale spectral clustering on graphs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Discriminative Orthogonal Nonnegative matrix factorization with flexibility for data representation
Expert Systems with Applications: An International Journal
Combining supervised and unsupervised models via unconstrained probabilistic embedding
Information Sciences: an International Journal
Local information-based fast approximate spectral clustering
Pattern Recognition Letters
Hi-index | 0.15 |
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.