A Scalable Noise Reduction Technique for Large Case-Based Systems

  • Authors:
  • Nicola Segata;Enrico Blanzieri;Pádraig Cunningham

  • Affiliations:
  • DISI, University of Trento, Italy;DISI, University of Trento, Italy;Computer Science, University College Dublin, Dublin, Ireland

  • Venue:
  • ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because case-based reasoning (CBR) is instance-based, it is vulnerable to noisy data. Other learning techniques such as support vector machines (SVMs) and decision trees have been developed to be noise-tolerant so a certain level of noise in the data can be condoned. By contrast, noisy data can have a big impact in CBR because inference is normally based on a small number of cases. So far, research on noise reduction has been based on a majority-rule strategy, cases that are out of line with their neighbors are removed. We depart from that strategy and use local SVMs to identify noisy cases. This is more powerful than a majority-rule strategy because it explicitly considers the decision boundary in the noise reduction process. In this paper we provide details on how such a local SVM strategy for noise reduction can be made scale to very large datasets ( 500,000 training samples). The technique is evaluated on nine very large datasets and shows excellent performance when compared with alternative techniques.