Computing LTS Regression for Large Data Sets

Authors:
Peter J. Rousseeuw;Katrien Driessen
Affiliations:
Department of Mathematics and Computer Science, Universiteit Antwerpen, Antwerpen, Belgium B-2020;Faculty of Applied Economics, Universiteit Antwerpen, Antwerpen, Belgium B-2000
Venue:
Data Mining and Knowledge Discovery
Year:
2006

Citing 11
Cited 26

Algorithms and complexity for least median of squares regression

Discrete Applied Mathematics
Robust regression and outlier detection

Robust regression and outlier detection
Robust regression methods for computer vision: a review

International Journal of Computer Vision
Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression

SIAM Journal on Scientific Computing
The feasible solution algorithm for least trimmed squares regression

Computational Statistics & Data Analysis
Robust regression applied to optical-fiber dimensional quality control

Technometrics
Improved feasible solution algorithms for high breakdown estimation

Computational Statistics & Data Analysis
A fast algorithm for the minimum covariance determinant estimator

Technometrics
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

An empirical analysis of software effort estimation with outlier elimination

Proceedings of the 4th international workshop on Predictor models in software engineering
Multivariate generalized S-estimators

Journal of Multivariate Analysis
A procedure for robust fitting in nonlinear regression

Computational Statistics & Data Analysis
Finding approximate solutions to combinatorial problems with very large data sets using BIRCH

Computational Statistics & Data Analysis
A kernel hat matrix based rejection criterion for outlier removal in support vector regression

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
A relaxed approach to combinatorial problems in robustness and diagnostics

Statistics and Computing
Bounded influence support vector regression for robust single-model estimation

IEEE Transactions on Neural Networks
OWA operators in regression problems

IEEE Transactions on Fuzzy Systems
Matrix strategies for computing the least trimmed squares estimation of the general linear and SUR models

Computational Statistics & Data Analysis
Imputation of missing values for compositional data using classical and robust methods

Computational Statistics & Data Analysis
Outlier detection and least trimmed squares approximation using semi-definite programming

Computational Statistics & Data Analysis
An evolutionary algorithm for robust regression

Computational Statistics & Data Analysis
Semiparametrically weighted robust estimation of regression models

Computational Statistics & Data Analysis
Robust diagnostics for the heteroscedastic regression model

Computational Statistics & Data Analysis
Adaptive Modeling of Analog/RF Circuits for Efficient Fault Response Evaluation

Journal of Electronic Testing: Theory and Applications
The least trimmed quantile regression

Computational Statistics & Data Analysis
Benchmark testing of algorithms for very robust regression: FS, LMS and LTS

Computational Statistics & Data Analysis
Decimal-Integer-Coded genetic algorithm for trimmed estimator of the multiple linear errors in variables model

ICICA'11 Proceedings of the Second international conference on Information Computing and Applications
A novel approach for the registration of weak affine images

Pattern Recognition Letters
Recursive robust least squares support vector regression based on maximum correntropy criterion

Neurocomputing
Multi-Objective Genetic Algorithm for Robust Clustering with Unknown Number of Clusters

International Journal of Applied Evolutionary Computation
On the value of outlier elimination on software effort estimation research

Empirical Software Engineering
AREION: Software effort estimation based on multiple regressions with adaptive recursive data partitioning

Information and Software Technology
Pointwise probability reinforcements for robust statistical inference

Neural Networks
An Adversarial Optimization Approach to Efficient Outlier Removal

Journal of Mathematical Imaging and Vision
An approach to the mean shift outlier model by Tikhonov regularization and conic programming

Intelligent Data Analysis - Business Analytics and Intelligent Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining aims to extract previously unknown patterns or substructures from large databases. In statistics, this is what methods of robust estimation and outlier detection were constructed for, see e.g. Rousseeuw and Leroy (1987). Here we will focus on least trimmed squares (LTS) regression, which is based on the subset of h cases (out of n) whose least squares fit possesses the smallest sum of squared residuals. The coverage h may be set between n/2 and n. The computation time of existing LTS algorithms grows too much with the size of the data set, precluding their use for data mining. In this paper we develop a new algorithm called FAST-LTS. The basic ideas are an inequality involving order statistics and sums of squared residuals, and techniques which we call `selective iteration' and `nested extensions'. We also use an intercept adjustment technique to improve the precision. For small data sets FAST-LTS typically finds the exact LTS, whereas for larger data sets it gives more accurate results than existing algorithms for LTS and is faster by orders of magnitude. This allows us to apply FAST-LTS to large databases.