Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm

  • Authors:
  • Manuel Castejón Limas;Joaquín B. Ordieres Meré;Francisco J. Martínez De Pisón Ascacibar;Eliseo P. Vergara González

  • Affiliations:
  • Dept. Ingeniería Eléctrica, Universidad de León, Leóon, Spain;Dept. Ingeniería Mecánica, Universidad de La Rioja, Logroño, Spain. joaquin.ordieres@dim.unirioja.es;Dept. Ingeniería Mecánica, Universidad de La Rioja, Logroño, Spain;Dept. Ingeniería Mecánica, Universidad de La Rioja, Logroño, Spain

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

A new method of outlier detection and data cleaning for both normal and non-normal multivariate data sets is proposed. It is based on an iterated local fit without a priori metric assumptions. We propose a new approach supported by finite mixture clustering which provides good results with large data sets. A multi-step structure, consisting of three phases, is developed. The importance of outlier detection in industrial modeling for open-loop control prediction is also described. The described algorithm gives good results both in simulations runs with artificial data sets and with experimental data sets recorded in a rubber factory. Finally, some discussion about this methodology is exposed.