Supervised Learning by Training on Aggregate Outputs

Authors:
David R. Musicant;Janara M. Christensen;Jamie F. Olson
Affiliations:
-;-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 8

Estimating labels from label proportions

Proceedings of the 25th international conference on Machine learning
Privacy-Preserving Data Publishing

Foundations and Trends in Databases
Instance- and bag-level manifold regularization for aggregate outputs classification

Proceedings of the 18th ACM conference on Information and knowledge management
Multiple instance learning via margin maximization

Applied Numerical Mathematics
Estimating Labels from Label Proportions

The Journal of Machine Learning Research
Learning from label proportions by optimizing cluster model selection

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Learning naive Bayes models for multiple-instance learning with label proportions

CAEPIA'11 Proceedings of the 14th international conference on Advances in artificial intelligence: spanish association for artificial intelligence
Learning Bayesian network classifiers from label proportions

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning is a classic data mining problem where one wishes to be be able to predict an output value associated with a particular input vector. We present a new twist on this classic problem where, instead of having the training set contain an individual output value for each input vector, the output values in the training set are only given in aggregate over a number of input vectors. This new problem arose from a particular need in learning on mass spectrometry data, but could easily apply to situations when data has been aggregated in order to maintain privacy. We provide a formal description of this new problem for both classification and regression. We then examine how k-nearest neighbor, neural networks, and support vector machines can be adapted for this problem.