Parametric Approximation Algorithms for High-Dimensional Euclidean Similarity

Authors:
Ömer Egecioglu
Affiliations:
-
Venue:
PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Year:
2001

Citing 14
Cited 0

A survey of adaptive sorting algorithms

ACM Computing Surveys (CSUR)
Improving text retrieval for the routing problem using latent semantic indexing

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Time-series similarity problems and well-separated geometric sets

SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Dimensionality reduction and similarity computation by inner product approximations

Proceedings of the ninth international conference on Information and knowledge management
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a spectrum of algorithms for measuring the similarity of high-dimensional vectors in Euclidean space. The algorithms proposed consist of a convex combination of two measures: one which contains summary data about the shape of a vector, and the other about the relative magnitudes of the coordinates. The former is based on a concept called bin-score permutations and a metric to quantify similarity of permutations, the latter on another novel approximation for inner-product computations based on power symmetric functions, which generalizes the Cauchy-Schwarz inequality. We present experiments on time-series data on labor statistics unemployment figures that show the effectiveness of the algorithm as a function of the parameter that combines the two parts.