Data analysis of (non-)metric proximities at linear costs

Authors:
Frank-Michael Schleif;Andrej Gisbrecht
Affiliations:
CITEC Centre of Excellence, Bielefeld University, Bielefeld, Germany;CITEC Centre of Excellence, Bielefeld University, Bielefeld, Germany
Venue:
SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Year:
2013

Citing 15
Cited 0

A stochastic self-organizing map for proximity data

Neural Computation
How to make large self-organizing maps for nonvectorial data

Neural Networks - New developments in self-organizing maps
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
On the Nyström Method for Approximating a Gram Matrix for Improved Kernel-Based Learning

The Journal of Machine Learning Research
The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence)

The Dissimilarity Representation for Pattern Recognition: Foundations And Applications (Machine Perception and Artificial Intelligence)
On the information and representation of non-Euclidean pairwise data

Pattern Recognition
Edit distance-based kernel functions for structural pattern classification

Pattern Recognition
Simpler core vector machines with enclosing balls

Proceedings of the 24th international conference on Machine learning
On sampling-based approximate spectral decomposition

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Similarity-based Classification: Concepts and Algorithms

The Journal of Machine Learning Research
Topographic mapping of large dissimilarity data sets

Neural Computation
Non-Euclidean dissimilarities: causes and informativeness

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Clustered Nyström method for large scale manifold learning and dimension reduction

IEEE Transactions on Neural Networks
Beyond Traditional Kernels: Classification in Two Dissimilarity-Based Representation Spaces

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Tachyon search speeds up retrieval of similar sequences by several orders of magnitude

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain specific (dis-)similarity or proximity measures, employed e.g. in alignment algorithms in bio-informatics, are often used to compare complex data objects and to cover domain specific data properties. Lacking an underlying vector space, data are given as pairwise (dis-)similarities. The few available methods for such data do not scale well to very large data sets. Kernel methods easily deal with metric similarity matrices, also at large scale, but costly transformations are necessary starting with non-metric (dis-) similarities. We propose an integrative combination of Nyström approximation, potential double centering and eigenvalue correction to obtain valid kernel matrices at linear costs. Accordingly effective kernel approaches, become accessible for these data. Evaluation at several larger (dis-)similarity data sets shows that the proposed method achieves much better runtime performance than the standard strategy while keeping competitive model accuracy. Our main contribution is an efficient linear technique, to convert (potentially non-metric) large scale dissimilarity matrices into approximated positive semi-definite kernel matrices.