Correcting MM estimates for "fat" data sets

  • Authors:
  • Ricardo A. Maronna;Victor J. Yohai

  • Affiliations:
  • Department of Mathematics, School of Exact Sciences, Universidad Nacional de La Plata and C.I.C.B.A., Argentina;Department of Mathematics, School of Exact and Natural Sciences, Universidad de Buenos Aires and CONICET, Argentina

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.03

Visualization

Abstract

Regression MM estimates require the estimation of the error scale, and the determination of a constant that controls the efficiency. These two steps are based on the asymptotic results that are derived assuming that the number of predictors p remains fixed while the number of observations n tends to infinity, which means assuming that the ratio p/n is ''small''. However, many high-dimensional data sets have a ''large'' value of p/n (say, =0.2). It is shown that the standard asymptotic results do not hold if p/n is large; namely that (a) the estimated scale underestimates the true error scale, and (b) that even if the scale is correctly estimated, the actual efficiency can be much lower than the nominal one. To overcome these drawbacks simple corrections for the scale and for the efficiency controlling constant are proposed, and it is demonstrated that these corrections improve on the estimate's performance under both normal and contaminated data.