The pyramid-technique: towards breaking the curse of dimensionality

Authors:
Stefan Berchtold;Christian Böhm;Hans-Peter Kriegal
Affiliations:
AT&T Labs Research, Florham Park, NJ;University of Munich, Germany;University of Munich, Germany
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 19
Cited 155

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A retrieval technique for similar shapes

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data warehousing and OLAP for decision support

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Feature-Based Retrieval of Similar Shapes

Proceedings of the Ninth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases

Locality preserving dictionaries: theory & application to clustering in databases

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multi-dimensional selectivity estimation using compressed histogram information

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A new method for similarity indexing of market basket data

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Clustering techniques for large data sets—from the past to the future

KDD '99 Tutorial notes of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Density-based indexing for approximate nearest-neighbor queries

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering declustered data for efficient retrieval

Proceedings of the eighth international conference on Information and knowledge management
Indexing the edges—a simple and yet efficient approach to high-dimensional indexing

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The onion technique: indexing for linear optimization queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Indexing images in Oracle8i

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Approximation algorithms for projective clustering

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Hill climbing algorithms for content-based retrieval of similar configurations

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Time series similarity measures (tutorial PM-2)

Tutorial notes of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Multidimensional Index Structures in Relational Databases

Journal of Intelligent Information Systems - Data warehousing and knowledge discovery
Object and query transformation: supporting multi-dimensional queries through code reuse

Proceedings of the ninth international conference on Information and knowledge management
Dimensionality reduction and similarity computation by inner product approximations

Proceedings of the ninth international conference on Information and knowledge management
Supporting subseries nearest neighbor search via approximation

Proceedings of the ninth international conference on Information and knowledge management
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Optimizing multidimensional index trees for main memory access

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modeling high-dimensional index structures using sampling

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Tri-plots: scalable tools for multidimensional data mining

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient processing of conical queries

Proceedings of the tenth international conference on Information and knowledge management
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
B-trees: bearing fruits of all kinds

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
The convex polyhedra technique: an index structure for high-dimensional space

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
General match: a subsequence matching method in time-series databases based on generalized windows

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
The design of a retrieval technique for high-dimensional data on tertiary storage

ACM SIGMOD Record
An efficient algorithm for hyperspherical range query processing in high-dimensional data space

Information Processing Letters
A retrieval technique for high-dimensional data and partially specified queries

Data & Knowledge Engineering
Techniques and Systems for Image and Video Retrieval

IEEE Transactions on Knowledge and Data Engineering
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space

IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
High-dimensional nearest neighbor search with remote data centers

Knowledge and Information Systems
VQ-index: an index structure for similarity searching in multimedia databases

Proceedings of the tenth ACM international conference on Multimedia
Dynamically Optimizing High-Dimensional Index Structures

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
High Level Indexing of User-Defined Types

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
GHOST: Fine Granularity Buffering of Indexes

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Implementation of Multidimensional Index Structures for Knowledge Discovery in Relational Databases

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
An Algorithm for Incremental Nearest Neighbor Search in High-Dimensional Data Spaces

Proceedings of the First International Conference on The Human Society and the Internet - Internet Related Socio-Economic Issues
Spatial Indexing with a Scale Dimension

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Constrained Nearest Neighbor Queries

SSTD '01 Proceedings of the 7th International Symposium on Advances in Spatial and Temporal Databases
Similarity Clustering of Dimensions for an Enhanced Visualization of Multidimensional Data

INFOVIS '98 Proceedings of the 1998 IEEE Symposium on Information Visualization
Temporal Indexing with Multidimensional File Structures

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
An Incremental Hypercube Approach for Finding Best Matches for Vague Queries

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
Improving the Performance of High-Energy Physics Analysis through Bitmap Indices

DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
The SH-tree: A Super Hybrid Index Structure for Multidimensional Data

DEXA '01 Proceedings of the 12th International Conference on Database and Expert Systems Applications
Bitmap Indices for Speeding Up High-Dimensional Data Analysis

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
COFE: A Scalable Method for Feature Extraction from Complex Objects

DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient similarity search for market basket data

The VLDB Journal — The International Journal on Very Large Data Bases
Data warehousing

Handbook of massive data sets
The building of BODHI, a bio-diversity database system

Information Systems - Special issue: Data management in bioinformatics
Approximation algorithms for projective clustering

Journal of Algorithms
Effective Management of Hierarchical Storage Using Two Levels of Data Clustering

MSS '03 Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
Supporting Content-Based Searches on Time Series via Approximation

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Multidimensionality in statistical, OLAP, and scientific databases

Multidimensional databases
ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

IEEE Transactions on Knowledge and Data Engineering
An Efficient Technique for Nearest-Neighbor Query Processing on the SPY-TEC

IEEE Transactions on Knowledge and Data Engineering
Image Navigation: A Massively Interactive Model for Similarity Retrieval of Images

International Journal of Computer Vision - Special Issue on Content-Based Image Retrieval
Making the Pyramid Technique Robust to Query Types and Workloads

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Querying high-dimensional data in single-dimensional space

The VLDB Journal — The International Journal on Very Large Data Bases
Robust Object Recognition in Images and the Related Database Problems

Multimedia Tools and Applications
Diagonal Ordering: a new approach to high-dimensional KNN processing

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Image Retrieval from the World Wide Web: Issues, Techniques, and Systems

ACM Computing Surveys (CSUR)
On accessing data in high-dimensional spaces: a comparative study of three space partitioning strategies

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Fast index filtering in vector approximation file

Fundamenta Informaticae
Indexing High-Dimensional Data for Efficient In-Memory Similarity Search

IEEE Transactions on Knowledge and Data Engineering
LSH forest: self-tuning indexes for similarity search

WWW '05 Proceedings of the 14th international conference on World Wide Web
Optimal data-space partitioning of spatial data for parallel I/O

Distributed and Parallel Databases
FTW: fast similarity search under the time warping distance

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Towards effective indexing for very large video sequence database

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
A case study in building layered DHT applications

Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications
Generalized multidimensional data mapping and query processing

ACM Transactions on Database Systems (TODS)
Toward Efficient Multifeature Query Processing

IEEE Transactions on Knowledge and Data Engineering
Detection of video sequences using compact signatures

ACM Transactions on Information Systems (TOIS)
Continuous query processing in data streams using duality of data and queries

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A non-linear dimensionality-reduction technique for fast similarity search in large databases

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A geometrical solution to time series searching invariant to shifting and scaling

Knowledge and Information Systems
Online summarization of dynamic time series data

The VLDB Journal — The International Journal on Very Large Data Bases
High dimensional nearest neighbor searching

Information Systems
A hierarchical bitmap indexing method for content based multimedia retrieval

IMSA'06 Proceedings of the 24th IASTED international conference on Internet and multimedia systems and applications
Access Structures for Angular Similarity Queries

IEEE Transactions on Knowledge and Data Engineering
Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Pattern Recognition
Data space mapping for efficient I/O in large multi-dimensional databases

Information Systems
Histogram-by: A grouping operator for continuous domains

Data & Knowledge Engineering
An adaptive and dynamic dimensionality reduction method for high-dimensional indexing

The VLDB Journal — The International Journal on Very Large Data Bases
Processing partially specified queries over high-dimensional databases

Data & Knowledge Engineering
An efficient indexing structure for content based multimedia retrieval with relevance feedback

Proceedings of the 2007 ACM symposium on Applied computing
Genetic algorithms for approximate similarity queries

Data & Knowledge Engineering
Ranked subsequence matching in time-series databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Content-based image retrieval from a large image database

Pattern Recognition
Zoned-partitioning of tree-like access methods

Information Systems
Indexing high-dimensional data in dual distance spaces: a symmetrical encoding approach

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Efficient evaluation of radial queries using the target tree

International Journal of Bioinformatics Research and Applications
Breaking the Curse of Cardinality on Bitmap Indexes

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
An efficient indexing structure for multimedia data

MIR '08 Proceedings of the 1st ACM international conference on Multimedia information retrieval
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Approximate Retrieval with HiPeR: Application to VA-Hierarchies

MMM '09 Proceedings of the 15th International Multimedia Modeling Conference on Advances in Multimedia Modeling
SubSpace Projection: A unified framework for a class of partition-based dimension reduction techniques

Information Sciences: an International Journal
Bounded coordinate system indexing for real-time video clip search

ACM Transactions on Information Systems (TOIS)
Distributed similarity search in high dimensions using locality sensitive hashing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Speeding Up Similarity Search on a Large Time Series Dataset under Time Warping Distance

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Easing the Dimensionality Curse by Stretching Metric Spaces

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
A dynamic insertion approach for multi-dimensional data using index structures

Proceedings of the 47th Annual Southeast Regional Conference
Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
CAMS: OLAPing Multidimensional Data Streams Efficiently

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
QUC-tree: integrating query context information for efficient music retrieval

IEEE Transactions on Multimedia - Special issue on integration of context and content
Fast k most similar neighbor classifier for mixed data (tree k-MSN)

Pattern Recognition
Algorithms for memory hierarchies: advanced lectures

Algorithms for memory hierarchies: advanced lectures
Quantization techniques for similarity search in high-dimensional data spaces

BNCOD'03 Proceedings of the 20th British national conference on Databases
A novel scheme for video similarity detection

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
Robust content-based video copy identification in a large reference database

CIVR'03 Proceedings of the 2nd international conference on Image and video retrieval
An efficient spatial search method based on SG-tree

APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches

High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches
Indexing high-dimensional data for main-memory similarity search

Information Systems
Efficient nearest neighbor query based on extended B+-tree in high-dimensional space

Pattern Recognition Letters
Scaling-invariant boundary image matching using time-series matching techniques

Data & Knowledge Engineering
Recursive partitioning method for trajectory indexing

ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Slicing the metric space to provide quick indexing of complex data in the main memory

Information Systems
iPoc: a polar coordinate based indexing method for nearest neighbor search in high dimensional space

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Large scale rich media information search

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
ArrayStore: a storage manager for complex parallel array processing

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Picasso - to sing, you must close your eyes and draw

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Scalable kNN search on vertically stored time series

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A spatio-temporal approach to the discovery of online social trends

COCOA'11 Proceedings of the 5th international conference on Combinatorial optimization and applications
Matching query processing in high-dimensional space

Proceedings of the 20th ACM international conference on Information and knowledge management
A grid clustering algorithm based on reference and density

ASIAN'05 Proceedings of the 10th Asian Computing Science conference on Advances in computer science: data management on the web
Adapting the pyramid technique for indexing ontological data

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Efficient processing of ranked queries with sweeping selection

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Indexing structures for content-based retrieval of large image databases: a review

AIRS'05 Proceedings of the Second Asia conference on Asia Information Retrieval Technology
An efficient search algorithm for high-dimensional indexing using cell based MBR

ICCSA'06 Proceedings of the 6th international conference on Computational Science and Its Applications - Volume Part I
General-Purpose learning machine using k-nearest neighbors algorithm

RoboCup 2005
Distributed similarity estimation using derived dimensions

The VLDB Journal — The International Journal on Very Large Data Bases
Adaptive query processing in point-transformation schemes

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Indexing biometric databases using pyramid technique

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
An access structure for similarity search in metric spaces

EDBT'04 Proceedings of the 2004 international conference on Current Trends in Database Technology
A new indexing method for high dimensional dataset

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Stable bounded canonical sets and image matching

EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Measuring the difficulty of distance-based indexing

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
An efficient zoning technique for multi-dimensional access methods

TEAA'05 Proceedings of the 31st VLDB conference on Trends in Enterprise Application Architecture
Modified geometric hashing for face database indexing

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
MUD: Mapping-based query processing for high-dimensional uncertain data

Information Sciences: an International Journal
DuoWave: Mitigating the curse of dimensionality for uncertain data

Data & Knowledge Engineering
Fast Index Filtering in Vector Approximation File

Fundamenta Informaticae
Indexing RFID data using the VG-curve

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
A comprehensive study of idistance partitioning strategies for kNN queries and high-dimensional data indexing

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Extending high-dimensional indexing techniques pyramid and iminmax(θ): lessons learned

BNCOD'13 Proceedings of the 29th British National conference on Big Data
PL-Tree: an efficient indexing method for high-dimensional data

SSTD'13 Proceedings of the 13th international conference on Advances in Spatial and Temporal Databases
QuEval: beyond high-dimensional indexing à la carte

Proceedings of the VLDB Endowment
FGNG: A fast multi-dimensional growing neural gas implementation

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose the Pyramid-Technique, a new indexing method for high-dimensional data spaces. The Pyramid-Technique is highly adapted to range query processing using the maximum metric Lmax. In contrast to all other index structures, the performance of the Pyramid-Technique does not deteriorate when processing range queries on data of higher dimensionality. The Pyramid-Technique is based on a special partitioning strategy which is optimized for high-dimensional data. The basic idea is to divide the data space first into 2d pyramids sharing the center point of the space as a top. In a second step, the single pyramids are cut into slices parallel to the basis of the pyramid. These slices from the data pages. Furthermore, we show that this partition provides a mapping from the given d-dimensional space to a 1-dimensional space. Therefore, we are able to use a B+-tree to manage the transformed data. As an analytical evaluation of our technique for hypercube range queries and uniform data distribution shows, the Pyramid-Technique clearly outperforms index structures using other partitioning strategies. To demonstrate the practical relevance of our technique, we experimentally compared the Pyramid-Technique with the X-tree, the Hilbert R-tree, and the Linear Scan. The results of our experiments using both, synthetic and real data, demonstrate that the Pyramid-Technique outperforms the X-tree and the Hilbert R-tree by a factor of up to 14 (number of page accesses) and up to 2500 (total elapsed time) for range queries.