Detecting and identifying ambiguities in regression problems: An approach using a modified mountain method

  • Authors:
  • Arup Kumar Nandi;Frank Klawonn

  • Affiliations:
  • Central Mechanical Engineering Research Institute/ M.G. Avenue, Durgapur-713209, WB, India. Tel.: +91 343 2546826/ Fax: +91 343 25467845/ E-mail: nandiarup@yahoo.com;Department of Computer Science, University of Applied Sciences BS/WF/ Salzdahlumer Str. 46/48, 38302, Wolfenbuettel, Germany, E-mail: f.klawonn@fh-wolfenbuettel.de

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Regression problems occur in many data analysis applications. The aim of regression is to approximate a function from which measurements were taken. When considering a regression problem, we have to take a number of aspects into account: How noisy the data are, whether they cover the domain sufficiently in which we want to find the regression function and what kind of regression function we should choose. However, the underlying assumption is always that the data actually are (noisy) samples of a single function. In some cases, this might not be true. For instance, when we consider data from a technical process that is controlled by human operators, these operators might use different strategies to reach a particular goal. Even a single operator might not stick to the same strategy all the time. Thus, the dataset containing a mixture of samples from different strategies, do not represent (noisy) samples from a single function. Therefore, there exists an ambiguity of selecting data from a large dataset for regression problems to fit a single model. In this paper, we suggest an approach using a modified mountain method (MMM) to select data from a jumble of large data samples that come from different functions, in order to cope with the ambiguities in the underlying regression problem. The proposed method may also serve to identify the best local (approximation) function(s). These are determined using a weighted regression analysis method. The proposed methodology is explained with a one-dimensional problem, a single input single output system, and later performance of the proposed approach is analysed with artificial data of a two-dimensional case study.