Data mining tasks and methods: Classification: nearest-neighbor approaches

  • Authors:
  • Belur V. Dasarathy

  • Affiliations:
  • Distinguished Scientist, Dynetics, Inc., Huntsville, Alabama

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article discusses the role and significance of nearest-neighbor (NNR) approaches (and its conceptual equivalents in the field of artificial intelligence, such as instance-based learning, lazy learning, memory-based reasoning, case-based reasoning, and the like) in the data mining and knowledge discovery process. The presentation first traces the development of NNR approaches from its origins in the early fifties to the present day with appropriate historical references. In the context of data mining applications, which necessarily involve large databases, computational concerns become a major issue and NNR techniques are particularly vulnerable in this sphere. Accordingly, this aspect of NNR techniques is discussed next in great detail to provide a panoramic view of the latest developments in this area. The associated issues of attribute selection and weighting are also addressed. This is followed by an overview of the different metrics that have been proposed in the literature to meet the special needs of the data mining community in contrast to the traditional Euclidean metric and its variants such as the Manhattan (city-block) distance generally employed in the pattern recognition field. A brief but direct discussion on the well-recognized problem of the curse of dimensionality is offered next, although this subject matter is indirectly covered in prior subsections. The article concludes with a brief closing summation of the objective and scope of the presentation highlighting some of the outstanding issues in this arena.