Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Analysis of Gene Expression Microarrays for Phenotype Classification
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Interrelated Two-way Clustering: An Unsupervised Approach for Gene Expression Data Analysis
BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Parameter-Free Hierarchical Co-clustering by n-Ary Splits
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Simultaneous clustering: a survey
PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
Hierarchical co-clustering: off-line and incremental approaches
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
We propose a hierarchical, model-based co-clustering framework for handling high-dimensional datasets. The technique views the dataset as a joint probability distribution over row and column variables. Our approach starts by clustering tuples in a dataset, where each cluster is characterized by a different probability distribution. Subsequently, the conditional distribution of attributes over tuples is exploited to discover natural co-clusters in the data. An intensive empirical evaluation highlights the effectiveness of our approach.