Non-Redundant Data Clustering

Authors:
David Gondek;Thomas Hofmann
Affiliations:
Brown University, Providence, RI;Brown University, Providence, RI
Venue:
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Year:
2004

Citing 0
Cited 21

Non-redundant clustering with conditional ensembles

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A scaleable document clustering approach for large document corpora

Information Processing and Management: an International Journal
Generalization from Observed to Unobserved Features by Clustering

The Journal of Machine Learning Research
A principled and flexible framework for finding alternative clusterings

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
The multi-view information bottleneck clustering

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Towards subjectifying text clustering

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Data Mining and Knowledge Discovery
Improving alternative text clustering quality in the avoiding bias task with spectral and flat partition algorithms

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
A cluster-level semi-supervision model for interactive clustering

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Iterative sIB algorithm

Pattern Recognition Letters
Which clustering do you want? inducing your ideal clustering with minimal feedback

Journal of Artificial Intelligence Research
Localized alternative cluster ensembles for collaborative structuring

ECML'06 Proceedings of the 17th European conference on Machine Learning
An experimental study of constrained clustering effectiveness in presence of erroneous constraints

Information Processing and Management: an International Journal
A novel approach for finding alternative clusterings using feature selection

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning
Fairness-Aware classifier with prejudice remover regularizer

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Generating multiple alternative clusterings via globally optimal subspaces

Data Mining and Knowledge Discovery
A study of K-Means-based algorithms for constrained clustering

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We present experimental results for applications in text mining and computer vision.