Data Selection Using SASH Trees for Support Vector Machines

  • Authors:
  • Chaofan Sun;Ricardo Vilalta

  • Affiliations:
  • Department of Computer Science, University of Houston, 4800 Calhoun Rd., Houston TX, 77204-3010 Email: vilalta@cs.uh.edu,;Center for Research and Advanced Studies (CINVESTAV), Av. Científica 1145, Guadalajara, 45010, México

  • Venue:
  • MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a data preprocessing procedure to select support vector (SV) candidates. We select decision boundary region vectors (BRVs) as SV candidates. Without the need to use the decision boundary, BRVs can be selected based on a vector's nearest neighbor of opposite class (NNO). To speed up the process, two spatial approximation sample hierarchical (SASH) trees are used for estimating the BRVs. Empirical results show that our data selection procedure can reduce a full dataset to the number of SVs or only slightly higher. Training with the selected subset gives performance comparable to that of the full dataset. For large datasets, overall time spent in selecting and training on the smaller dataset is significantly lower than the time used in training on the full dataset.