Data analysis of (non-)metric proximities at linear costs

  • Authors:
  • Frank-Michael Schleif;Andrej Gisbrecht

  • Affiliations:
  • CITEC Centre of Excellence, Bielefeld University, Bielefeld, Germany;CITEC Centre of Excellence, Bielefeld University, Bielefeld, Germany

  • Venue:
  • SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices.