Mixture models for learning from incomplete data
Computational learning theory and natural learning systems: Volume IV
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Classification of multispectral data by joint supervised-unsupervised learning
Classification of multispectral data by joint supervised-unsupervised learning
IEEE Transactions on Pattern Analysis and Machine Intelligence
Training a naive bayes classifier via the EM algorithm with a class distribution constraint
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Simple, robust, scalable semi-supervised learning via expectation regularization
Proceedings of the 24th international conference on Machine learning
Estimating labels from label proportions
Proceedings of the 25th international conference on Machine learning
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
The expectation maximization algorithm is a popular approach to learning Gaussian mixture models from unlabeled data. In addition to the unlabeled data, in many applications, additional sources of information such as apriori knowledge of mixing proportions are also available. We present a weakly supervised approach, in the form of a penalized expectation maximization algorithm that uses apriori knowledge to guide the model training process. The algorithm penalizes those models whose predicted mixing proportions have high divergence from the a-priori mixing proportions. We also present an extension to incorporate both labeled and unlabeled data in a semi-supervised setting. Systematic evaluations on several publicly available datasets show that the proposed algorithms outperforms the expectation maximization algorithm. The performance gains are particularly significant when the amount of unlabeled data is limited and in the presence of noise.