A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Vector approximation based indexing for non-uniform high dimensional data sets
Proceedings of the ninth international conference on Information and knowledge management
ACM Computing Surveys (CSUR)
Fast Nearest Neighbor Search in High-Dimensional Space
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
C2VA: Trim High Dimensional Indexes
WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
CVA file: an index structure for high-dimensional datasets
Knowledge and Information Systems
An efficient algorithm for arbitrary reverse furthest neighbor queries
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Hi-index | 0.00 |
Indexing structures based on space partitioning are powerless because of the well-known "curse of dimensionality". Linear scan of the data with approximation is more efficient in high dimensional similarity search. However, approaches so far concentrated on reducing I/O, ignored the computation cost. For an expensive distance function such as L p norm with fractional p , the computation cost becomes the bottleneck. We propose a new technique to address expensive distance functions by "indexing the function" by pre-computing some key values of the function once. Then, the values are used to develop the upper/lower bounds of the distance between each data and the query vector. The technique is extremely efficient since it avoids most of the distance function computations; moreover, it does not spend any extra storage because no index is constructed and stored. The efficiency is confirmed by cost analyses, as well as experiments on synthetic and real data.