Identifying outliers with sequential fences

  • Authors:
  • Neil C. Schwertman;Rapti de Silva

  • Affiliations:
  • Department of Mathematics and Statistics, California State University Chico, Chico, CA 95929-0525, USA;Department of Mathematics and Statistics, California State University Chico, Chico, CA 95929-0525, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2007

Quantified Score

Hi-index 0.03

Visualization

Abstract

The identification of contaminated observations or outliers is an important part of data analysis since such observations can have a profound influence and distort the analysis. One simple graphical method, based on the box-plot, consists of constructing fences. The fences method is not only appealing in its simplicity but more importantly because it does not use the extreme potential outliers which can inflate the computing of a measure of dispersion and hence lessen the sensitivity for identifying outliers. The commonly used fences procedure may be too liberal in some situations and too conservative in others. That is, it sets one criteria for all scenarios and does not afford the data analyst the flexibility of specifying the probability or criterion for designating an observation as an outlier in a variety of circumstances. Furthermore, the most commonly used procedure is a ''one size fits all'' style and does not incorporate sample size. Unfortunately, a value that is extreme in a small data set might be expected in a much larger sample. In this paper, a method is proposed to include sample size in constructing fences as well as a sequential procedure to identify multiple outliers at a specified probability.