Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A Spectral Algorithm for Learning Mixtures of Distributions
FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
A Two-Round Variant of EM for Gaussian Mixtures
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Learning Mixtures of Gaussians
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Sampling from large matrices: An approach through geometric functional analysis
Journal of the ACM (JACM)
Two-view feature generation model for semi-supervised learning
Proceedings of the 24th international conference on Machine learning
Isotropic PCA and Affine-Invariant Clustering
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Multi-view regression via canonical correlation analysis
COLT'07 Proceedings of the 20th annual conference on Learning theory
The spectral method for general mixture models
COLT'05 Proceedings of the 18th annual conference on Learning Theory
On spectral learning of mixtures of distributions
COLT'05 Proceedings of the 18th annual conference on Learning Theory
Multi-view clustering of multilingual documents
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Exploiting tag and word correlations for improved webpage clustering
SMUC '10 Proceedings of the 2nd international workshop on Search and mining user-generated contents
Multiple view clustering using a weighted combination of exemplar-based mixture models
IEEE Transactions on Neural Networks
Neurocomputing
Fusing heterogeneous modalities for video and image re-ranking
Proceedings of the 1st ACM International Conference on Multimedia Retrieval
WikiTopics: what is popular on Wikipedia and why
WASDGML '11 Proceedings of the Workshop on Automatic Summarization for Different Genres, Media, and Languages
Proceedings of the fifth ACM conference on Recommender systems
Dimensionality reduction by Mixed Kernel Canonical Correlation Analysis
Pattern Recognition
Community detection via heterogeneous interaction analysis
Data Mining and Knowledge Discovery
Leveraging Social Bookmarks from Partially Tagged Corpus for Improved Web Page Clustering
ACM Transactions on Intelligent Systems and Technology (TIST)
Integrating social media data for community detection
MSM'11 Proceedings of the 2011 international conference on Modeling and Mining Ubiquitous Social Media
Regularized nonnegative shared subspace learning
Data Mining and Knowledge Discovery
Co-regularized PLSA for multi-view clustering
ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II
Flexible and robust co-regularized multi-domain graph clustering
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Large-margin multi-view Gaussian process for image classification
Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
Neighborhood Correlation Analysis for Semi-paired Two-View Data
Neural Processing Letters
Learning canonical correlations of paired tensor sets via tensor-to-vector projection
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Comment-based multi-view clustering of web 2.0 items
Proceedings of the 23rd international conference on World wide web
Hi-index | 0.00 |
Clustering data in high dimensions is believed to be a hard problem in general. A number of efficient clustering algorithms developed in recent years address this problem by projecting the data into a lower-dimensional subspace, e.g. via Principal Components Analysis (PCA) or random projections, before clustering. Here, we consider constructing such projections using multiple views of the data, via Canonical Correlation Analysis (CCA). Under the assumption that the views are un-correlated given the cluster label, we show that the separation conditions required for the algorithm to be successful are significantly weaker than prior results in the literature. We provide results for mixtures of Gaussians and mixtures of log concave distributions. We also provide empirical support from audio-visual speaker clustering (where we desire the clusters to correspond to speaker ID) and from hierarchical Wikipedia document clustering (where one view is the words in the document and the other is the link structure).