Improved generative semisupervised learning based on finely grained component-conditional class labeling

Authors:
David J. Miller;Jayaram Raghuram;George Kesidis;Christopher M. Collins
Affiliations:
-;-;-;-
Venue:
Neural Computation
Year:
2012

Citing 26
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Unsupervised Learning of Finite Mixture Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence
Webmining: learning from the world wide web

Computational Statistics & Data Analysis - Nonlinear methods and data mining
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure

Neural Computation
Discriminative Clustering: Optimal Contingency Tables by Learning Metrics

ECML '02 Proceedings of the 13th European Conference on Machine Learning
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A Mixture Model and EM-Based Algorithm for Class Discovery, Robust Classification, and Outlier Rejection in Mixed Labeled/Unlabeled Data Sets

IEEE Transactions on Pattern Analysis and Machine Intelligence
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Semisupervised Learning of Classifiers: Theory, Algorithms, and Their Application to Human-Computer Interaction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semi-Supervised Mixture-of-Experts Classification

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
Mixture Modeling with Pairwise, Instance-Level Class Constraints

Neural Computation
A continuation method for semi-supervised SVMs

ICML '06 Proceedings of the 23rd international conference on Machine learning
A study of Gaussian mixture models of color and texture features for image classification and segmentation

Pattern Recognition
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Markov Random Field Modeling in Image Analysis

Markov Random Field Modeling in Image Analysis
Learning from labeled and unlabeled data: an empirical study across techniques and domains

Journal of Artificial Intelligence Research
Semi-Supervised Learning

Semi-Supervised Learning
Model-based clustering of microarray expression data via latent Gaussian mixture models

Bioinformatics
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
On information regularization

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Space-alternating generalized expectation-maximization algorithm

IEEE Transactions on Signal Processing
Unsupervised learning of parsimonious mixtures on large spaces with integrated feature and component selection

IEEE Transactions on Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce new inductive, generative semisupervised mixtures with more finely grained class label generation mechanisms than in previous work. Our models combine advantages of semisupervised mixtures, which achieve label extrapolation over a component, and nearest-neighbor (NN)/nearest-prototype (NP) classification, which achieve accurate classification in the vicinity of labeled samples or prototypes. For our NN-based method, we propose a novel two-stage stochastic data generation, with all samples first generated using a standard finite mixture and then all class labels generated, conditioned on the samples and their components of origin. This mechanism entails an underlying Markov random field, specific to each mixture component or cluster. We invoke the pseudo-likelihood formulation, which forms the basis for an approximate generalized expectation-maximization model learning algorithm. Our NP-based model overcomes a problem with the NN-based model that manifests at very low labeled fractions. Both models are advantageous when within-component class proportions are not constant over the feature space region "owned by" a component. The practicality of this scenario is borne out by experiments on UC Irvine data sets, which demonstrate significant gains in classification accuracy over previous semisupervised mixtures and also overall gains, over KNN classification. Moreover, for very small labeled fractions, our methods overall outperform supervised linear and nonlinear kernel support vector machines.