Non-redundant data clustering

Authors:
David Gondek;Thomas Hofmann
Affiliations:
Brown University, Department of Computer Science, Providence, RI, USA;Brown University, Department of Computer Science, Providence, RI, USA
Venue:
Knowledge and Information Systems
Year:
2007

Citing 9
Cited 6

Exploratory mining and pruning optimizations of constrained associations rules

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Multivariate Information Bottleneck

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
DualMiner: a dual-pruning algorithm for itemsets with constraints

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based Clustering with Soft Balancing

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining

Non-negative matrix factorization for semi-supervised data clustering

Knowledge and Information Systems
“Best K”: critical clustering structures in categorical datasets

Knowledge and Information Systems
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
CECM: Constrained evidential C-means algorithm

Computational Statistics & Data Analysis
Learning a subspace for clustering via pattern shrinking

Information Processing and Management: an International Journal
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns. In practice, this discovery process should avoid redundancies with existing knowledge about class structures or groupings, and reveal novel, previously unknown aspects of the data. In order to deal with this problem, we present an extension of the information bottleneck framework, called coordinated conditional information bottleneck, which takes negative relevance information into account by maximizing a conditional mutual information score subject to constraints. Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes. We discuss extensions of the technique to the tasks of semi-supervised classification and enumeration of successive non-redundant clusterings. We present experimental results for applications in text mining and computer vision.