Maxdiff kd-trees for data condensation

  • Authors:
  • B. L. Narayan;C. A. Murthy;Sankar K. Pal

  • Affiliations:
  • Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India;Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India;Machine Intelligence Unit, Indian Statistical Institute, 203, B.T. Road, Kolkata 700 108, West Bengal, India

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2006

Quantified Score

Hi-index 0.10

Visualization

Abstract

Prototype selection on the basis of conventional clustering algorithms results in good representation but is extremely time-taking on large data sets. kd-trees, on the other hand, are exceptionally efficient in terms of time and space requirements for large data sets, but fail to produce a reasonable representation in certain situations. We propose a new algorithm with speed comparable to the present kd-tree based algorithms which overcomes the problems related to the representation for high condensation ratios. It uses the Maxdiff criterion to separate out distant clusters in the initial stages before splitting them any further thus improving on the representation. The splits being axis-parallel, more nodes would be required for the representing a data set which has no regions where the points are well separated.