Learning multiple nonredundant clusterings

Authors:
Ying Cui;Xiaoli Z. Fern;Jennifer G. Dy
Affiliations:
Yahoo! Labs, Sunnyvale, CA;Oregon State University, Corvallis, OR;Northeastern University, Boston, MA
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2010

Citing 21
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data clustering: a review

ACM Computing Surveys (CSUR)
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Simultaneous Feature Selection and Clustering Using Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Non-redundant clustering with conditional ensembles

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Non-redundant clustering

Non-redundant clustering
Learning Pairwise Similarity for Data Clustering

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Meta Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Locally adaptive metrics for clustering high dimensional data

Data Mining and Knowledge Discovery
Non-redundant Multi-view Clustering via Orthogonalization

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real-world applications often involve complex data that can be interpreted in many different ways. When clustering such data, there may exist multiple groupings that are reasonable and interesting from different perspectives. This is especially true for high-dimensional data, where different feature subspaces may reveal different structures of the data. However, traditional clustering is restricted to finding only one single clustering of the data. In this article, we propose a new clustering paradigm for exploratory data analysis: find all non-redundant clustering solutions of the data, where data points in the same cluster in one solution can belong to different clusters in other partitioning solutions. We present a framework to solve this problem and suggest two approaches within this framework: (1) orthogonal clustering, and (2) clustering in orthogonal subspaces. In essence, both approaches find alternative ways to partition the data by projecting it to a space that is orthogonal to the current solution. The first approach seeks orthogonality in the cluster space, while the second approach seeks orthogonality in the feature space. We study the relationship between the two approaches. We also combine our framework with techniques for automatically finding the number of clusters in the different solutions, and study stopping criteria for determining when all meaningful solutions are discovered. We test our framework on both synthetic and high-dimensional benchmark data sets, and the results show that indeed our approaches were able to discover varied clustering solutions that are interesting and meaningful.