Pooled ANOVA

  • Authors:
  • Michael Last;Gheorghe Luta;Alex Orso;Adam Porter;Stan Young

  • Affiliations:
  • National Institute of Statistical Sciences, PO Box 14006, Research Triangle Park, NC, 27709, United States;National Institute of Statistical Sciences, PO Box 14006, Research Triangle Park, NC, 27709, United States;Georgia Institute of Technology, Atlanta, GA, United States;University of Maryland, College Park, MD, United States;National Institute of Statistical Sciences, PO Box 14006, Research Triangle Park, NC, 27709, United States

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.03

Visualization

Abstract

We introduce Pooled ANOVA, a greedy algorithm to sequentially select the rare important factors from a large set of factors. Problems such as computer simulations and software performance tuning involve a large number of factors, few of which have an important effect on the outcome or performance measure. We pool multiple factors together, and test the pool for significance. If the pool has a significant effect we retain the factors for deconfounding. If not, we either declare that none of the factors are important, or retain them for follow-up decoding, depending on our assumptions and stage of testing. The sparser important factors are, the bigger the savings. Pooled ANOVA requires fewer assumptions than other, similar methods (e.g. sequential bifurcation), such as not requiring all important effects to have the same sign. We demonstrate savings of 25%-35% when compared to a conventional ANOVA, and also the ability to work in a setting where Sequential Bifurcation fails.