A New Method for Mining Regression Classes in Large Data Sets

Authors:
Yee Leung;Jiang-Hong Ma;Wen-Xiu Zhang
Affiliations:
-;-;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2001

Citing 17
Cited 6

Kendall's advanced theory of statistics

Kendall's advanced theory of statistics
Mixtures of linear regressions

Computational Statistics & Data Analysis
A Highly Robust Estimator Through Partially Likelihood Function Modeling and Its Application in Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
The nature of statistical learning theory

The nature of statistical learning theory
A database perspective on knowledge discovery

Communications of the ACM
Stochastic preference modeling within a switching regression framework

Computers and Operations Research
Parametric Model Fitting: From Inlier Characterization to Outlier Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data mining and KDD: promise and challenges

Future Generation Computer Systems - Special double issue on data mining
A statistical perspective on data mining

Future Generation Computer Systems - Special double issue on data mining
Breakpoint Detection Using Covariance Propagation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
Intelligent Data Analysis: An Introduction

Intelligent Data Analysis: An Introduction
Statistical Themes and Lessons for Data Mining

Data Mining and Knowledge Discovery
MINPRAN: A New Robust Estimator for Computer Vision

IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust clustering methods: a unified view

IEEE Transactions on Fuzzy Systems
A new approach to fuzzy modeling

IEEE Transactions on Fuzzy Systems
Gaussian mixture density modeling, decomposition, and applications

IEEE Transactions on Image Processing

A highly robust estimator for regression models

Pattern Recognition Letters
Design of adaptive fuzzy model for classification problem

Engineering Applications of Artificial Intelligence
Local area network anomaly detection using association rules mining

WiCOM'09 Proceedings of the 5th International Conference on Wireless communications, networking and mobile computing
Mining regression-classes in fuzzy point data sets

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 2
Sparse least squares support vector machine for function estimation

ISNN'06 Proceedings of the Third international conference on Advances in Neural Networks - Volume Part I
Fuzzy probability c-regression estimation based on least squares support vector machine

ICONIP'06 Proceedings of the 13 international conference on Neural Information Processing - Volume Part I

Quantified Score

Hi-index	0.14

Visualization

Abstract

Extracting patterns and models of interest from large databases is attracting much attention in a variety of disciplines. Knowledge discovery in databases (KDD) and data mining (DM) are areas of common interest to researchers in machine learning, pattern recognition, statistics, artificial intelligence, and high performance computing. An effective and robust method, coined regression-class mixture decomposition (RCMD) method, is proposed in this paper for the mining of regression classes in large data sets, especially those contaminated by noise. A new concept, called 驴regression class驴 which is defined as a subset of the data set that is subject to a regression model, is proposed as a basic building block on which the mining process is based. A large data set is treated as a mixture population in which there are many such regression classes and others not accounted for by the regression models. Iterative and genetic-based algorithms for the optimization of the objective function in the RCMD method are also constructed. It is demonstrated that the RCMD method can resist a very large proportion of noisy data, identify each regression class, assign an inlier set of data points supporting each identified regression class, and determine the a priori unknown number of statistically valid models in the data set. Although the models are extracted sequentially, the final result is almost independent of the extraction order due to a novel dynamic classification strategy employed in the handling of overlapping regression classes. The effectiveness and robustness of the RCMD method are substantiated by a set of simulation experiments and a real-life application showing the way it can be used to fit mixed data to linear regression classes and nonlinear structures in various situations.