Improving predictions using aggregate information

Authors:
Amit Dhurandhar
Affiliations:
IBM TJ Watson, Yorktown Heights, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 7
Cited 1

Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Coarse-to-Fine Face Detection

International Journal of Computer Vision - Special issue on statistical and computational theories of vision: Part II
Temporal causal modeling with graphical granger methods

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning dynamic temporal graphs for oil-production equipment monitoring system

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Cascaded models for articulated pose estimation

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part II
Stacked hierarchical labeling

ECCV'10 Proceedings of the 11th European conference on Computer vision: Part VI
Multi-step Time Series Prediction in Complex Instrumented Domains

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops

Semantic similarity measurement using historical google search patterns

Information Systems Frontiers

Quantified Score

Hi-index	0.01

Visualization

Abstract

In domains such as consumer products or manufacturing amongst others, we have problems that warrant the prediction of a continuous target. Besides the usual set of explanatory attributes we may also have exact (or approximate) estimates of aggregated targets, which are the sums of disjoint sets of individual targets that we are trying to predict. Hence, the question now becomes can we use these aggregated targets, which are a coarser piece of information, to improve the quality of predictions of the individual targets? In this paper, we provide a simple yet provable way of accomplishing this. In particular, given predictions from any regression model of the target on the test data, we elucidate a provable method for improving these predictions in terms of mean squared error, given exact (or accurate enough) information of the aggregated targets. These estimates of the aggregated targets may be readily available or obtained -- through multilevel regression -- at different levels of granularity. Based on the proof of our method we suggest a criterion for choosing the appropriate level. Moreover, in addition to estimates of the aggregated targets, if we have exact (or approximate) estimates of the mean and variance of the target distribution, then based on our general strategy we provide an optimal way of incorporating this information so as to further improve the quality of predictions of the individual targets. We then validate the results and our claims by conducting experiments on synthetic and real industrial data obtained from diverse domains.