Mixture Modeling with Pairwise, Instance-Level Class Constraints

Authors:
Qi Zhao;David J. Miller
Affiliations:
Department of Electrical Engineering, Penn State University, University Park, PA 16802, U.S.A.;Department of Electrical Engineering, Penn State University, University Park, PA 16802, U.S.A.
Venue:
Neural Computation
Year:
2005

Citing 12
Cited 4

Statistical physics, mixtures of distributions, and the EM algorithm

Neural Computation
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Pairwise Data Clustering by Deterministic Annealing

IEEE Transactions on Pattern Analysis and Machine Intelligence
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Intelligent clustering with instance-level constraints

Intelligent clustering with instance-level constraints
A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Segmentation Given Partial Grouping Constraints

IEEE Transactions on Pattern Analysis and Machine Intelligence

Metric learning for semi-supervised clustering using pairwise constraints and the geometrical structure of data

Intelligent Data Analysis
Non-linear metric learning using pairwise similarity and dissimilarity constraints and the geometrical structure of data

Pattern Recognition
Redefining class definitions using constraint-based clustering: an application to remote sensing of the earth's surface

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Improved generative semisupervised learning based on finely grained component-conditional class labeling

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of semisupervised clustering/mixture modeling is to learn the underlying groups comprising a given data set when there is also some form of instance-level supervision available, usually in the form of labels or pairwise sample constraints. Most prior work with constraints assumes the number of classes is known, with each learned cluster assumed to be a class and, hence, subject to the given class constraints. When the number of classes is unknown or when the one-cluster-per-class assumption is not valid, the use of constraints may actually be deleterious to learning the ground-truth data groups. We address this by (1) allowing allocation of multiple mixture components to individual classes and (2) estimating both the number of components and the number of classes. We also address new class discovery, with components void of constraints treated as putative unknown classes. For both real-world and synthetic data, our method is shown to accurately estimate the number of classes and to give favorable comparison with the recent approach of Shental, Bar-Hillel, Hertz, and Weinshall (2003).