Learning from label proportions by optimizing cluster model selection

Authors:
Marco Stolpe;Katharina Morik
Affiliations:
Technical University of Dortmund, Artificial Intelligence Group, Dortmund, Germany;Technical University of Dortmund, Artificial Intelligence Group, Dortmund, Germany
Venue:
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Year:
2011

Citing 14
Cited 2

Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms

International Journal of Man-Machine Studies - Special issue: symbolic problem solving in noisy and novel task environments
The nature of statistical learning theory

The nature of statistical learning theory
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Machine Learning

Machine Learning
Random Forests

Machine Learning
Induction of Decision Trees

Machine Learning
Kernel k-means: spectral clustering and normalized cuts

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Supervised Learning by Training on Aggregate Outputs

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Kernel K-means Based Framework for Aggregate Outputs Classification

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Estimating Labels from Label Proportions

The Journal of Machine Learning Research
Semi-Supervised Learning

Semi-Supervised Learning
Data Mining: Practical Machine Learning Tools and Techniques

Data Mining: Practical Machine Learning Tools and Techniques
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Estimation based on RBM from label proportions in large group case

IScIDE'12 Proceedings of the third Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Learning Bayesian network classifiers from label proportions

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a supervised learning scenario, we learn a mapping from input to output values, based on labeled examples. Can we learn such a mapping also from groups of unlabeled observations, only knowing, for each group, the proportion of observations with a particular label? Solutions have real world applications. Here, we consider groups of steel sticks as samples in quality control. Since the steel sticks cannot be marked individually, for each group of sticks it is only known how many sticks of high (low) quality it contains. We want to predict the achieved quality for each stick before it reaches the final production station and quality control, in order to save resources. We define the problem of learning from label proportions and present a solution based on clustering. Our method empirically shows a better prediction performance than recent approaches based on probabilistic SVMs, Kernel k-Means or conditional exponential models.