Algorithms for clustering data
Algorithms for clustering data
Characterization and detection of noise in clustering
Pattern Recognition Letters
Fundamentals of speech recognition
Fundamentals of speech recognition
On the exponential value of labeled samples
Pattern Recognition Letters
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
Deterministic annealing EM algorithm
Neural Networks
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Class discovery in gene expression data
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Unsupervised Learning of Finite Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks: A Comprehensive Foundation
Neural Networks: A Comprehensive Foundation
Webmining: learning from the world wide web
Computational Statistics & Data Analysis - Nonlinear methods and data mining
Toward Optimal Active Learning through Sampling Estimation of Error Reduction
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reclassification as Supervised Clustering
Neural Computation
Mixture Modeling with Pairwise, Instance-Level Class Constraints
Neural Computation
Wavelet-based modeling of singular values for image texture classification
Machine Graphics & Vision International Journal
Image texture classification using wavelet packet transform and probabilistic neural network
Intelligent Data Analysis
Transferred Dimensionality Reduction
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Robust Factorization Methods Using a Gaussian/Uniform Mixture Model
International Journal of Computer Vision
Expert Systems with Applications: An International Journal
A classification algorithm based on local cluster centers with a few labeled training examples
Knowledge-Based Systems
Enterprise data classification using semantic web technologies
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part II
Finding audio-visual events in informal social gatherings
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Expert Systems with Applications: An International Journal
A predictive deviance criterion for selecting a generative model in semi-supervised classification
Computational Statistics & Data Analysis
Semi-supervised projected model-based clustering
Data Mining and Knowledge Discovery
Hi-index | 0.15 |
Several authors have shown that, when labeled data are scarce, improved classifiers can be built by augmenting the training set with a large set of unlabeled examples and then performing suitable learning. These works assume each unlabeled sample originates from one of the (known) classes. Here, we assume each unlabeled sample comes from either a known or from a heretofore undiscovered class. We propose a novel mixture model which treats as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each sample. Two types of mixture components are posited. "Predefined" components generate data from known classes and assume class labels are missing at random. "Nonpredefined" components only generate unlabeled data驴i.e., they capture exclusively unlabeled subsets, consistent with an outlier distribution or new classes. The predefined/nonpredefined natures are data-driven, learned along with the other parameters via an extension of the EM algorithm. Our modeling framework addresses problems involving both the known and unknown classes: 1) robust classifier design, 2) classification with rejections, and 3) identification of the unlabeled samples (and their components) from unknown classes. Case 3 is a step toward new class discovery. Experiments are reported for each application, including topic discovery for the Reuters domain. Experiments also demonstrate the value of label presence/absence data in learning accurate mixtures.