Data mining tasks and methods: Classification: regression

  • Authors:
  • Robert Henery

  • Affiliations:
  • Statistics Department, Strathclyde University, Glasgow, United Kingdom

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Regression methods for classification are based on population means, and usually, though by no means always, a linear combination of attributes is the starting point for predicting the population means, even if subsequently the linear combination is subjected to a nonlinear transformation as in logistic regression, In this sense, neural networks are extensions to logistic regression, but they are treated in Chapter 16.1.8 of this handbook. We begin by describing the classical approach to statistical discrimination, which is based on multivariate normal distributions. In logistic regression, although the underlying method is identical, an alternative justification is given for choosing the coefficients, and this leads to more efficient solutions in some circumstances. When the basic assumptions of linear or logistic regression are not satisfied, for example when there are many outliers, more robust and flexible approaches are required, and some modern statistical approaches have been developed to deal with this problem. Finally, Fisher's canonical discriminants may be used to display multivariate data to good effect, either to investigate relationships between the population means or to investigate the assumptions of multivariate normality.