Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

Authors:
Hanxiong Chen;Jianquan Liu;Kazutaka Furuse;Jeffrey Xu Yu;Nobuo Ohbo
Affiliations:
Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577;Systems Engineering & Engineering Management, Chinese University of HongKong, China;Computer Science, University of Tsukuba, Ibaraki, Japan 305-8577
Venue:
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Year:
2009

Citing 13
Cited 1

A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Vector approximation based indexing for non-uniform high dimensional data sets

Proceedings of the ninth international conference on Information and knowledge management
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
C2VA: Trim High Dimensional Indexes

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
CVA file: an index structure for high-dimensional datasets

Knowledge and Information Systems

An efficient algorithm for arbitrary reverse furthest neighbor queries

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Indexing structures based on space partitioning are powerless because of the well-known "curse of dimensionality". Linear scan of the data with approximation is more efficient in high dimensional similarity search. However, approaches so far concentrated on reducing I/O, ignored the computation cost. For an expensive distance function such as L p norm with fractional p , the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by "indexing the function" by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between each data and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not spend any extra storage because no index is constructed and stored. The efficiency is confirmed by cost analyses, as well as experiments on synthetic and real data.