Fast parallel similarity search in multimedia databases

Authors:
Stefan Berchtold;Christian Böhm;Bernhard Braunmüller;Daniel A. Keim;Hans-Peter Kriegel
Affiliations:
University of Munich, Germany;University of Munich, Germany;University of Munich, Germany;University of Munich, Germany;University of Munich, Germany
Venue:
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Year:
1997

Citing 19
Cited 80

Computational geometry: an introduction

Computational geometry: an introduction
Discrete mathematics

Discrete mathematics
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A retrieval technique for similar shapes

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Molecular docking using shape descriptors

Journal of Computational Chemistry
Techniques for automatically correcting words in text

ACM Computing Surveys (CSUR)
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Nearest neighbor searching and applications

Nearest neighbor searching and applications
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Feature-index-based similar shape retrieval

Proceedings of the third IFIP WG2.6 working conference on Visual database systems 3 (VDB-3)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Feature-Based Retrieval of Similar Shapes

Proceedings of the Ninth International Conference on Data Engineering
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
BOUNDS ON INFORMATION RETRIEVAL EFFICIENCY IN STATIC FILE STRUCTURES.

BOUNDS ON INFORMATION RETRIEVAL EFFICIENCY IN STATIC FILE STRUCTURES.

S3: similarity search in CAD database systems

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Similarity query processing using disk arrays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Efficient disk allocation for fast similarity searching

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Clustering and singular value decomposition for approximate indexing in high dimensional spaces

Proceedings of the seventh international conference on Information and knowledge management
Enhanced nearest neighbour search on the R-tree

ACM SIGMOD Record
Clustering declustered data for efficient retrieval

Proceedings of the eighth international conference on Information and knowledge management
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multidimensional Index Structures in Relational Databases

Journal of Intelligent Information Systems - Data warehousing and knowledge discovery
A cost model for query processing in high dimensional data spaces

ACM Transactions on Database Systems (TODS)
Using Hilbert curve in image storing and retrieving

MULTIMEDIA '00 Proceedings of the 2000 ACM workshops on Multimedia
Scalable integrated region-based image retrieval using IRM and statistical clustering

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
Effective nearest neighbor indexing with the euclidean metric

Proceedings of the tenth international conference on Information and knowledge management
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Similarity-based algebra for multimedia database systems

ADC '01 Proceedings of the 12th Australasian database conference
Similarity based retrieval from sequence databases using automata as queries

Proceedings of the eleventh international conference on Information and knowledge management
An Enhanced Technique for k-Nearest Neighbor Queries with Non-Spatial Selection Predicates

Multimedia Tools and Applications
Approximation-Based Similarity Search for 3-D Surface Segments

Geoinformatica
A Multistep Approach for Shape Similarity Search in Image Databases

IEEE Transactions on Knowledge and Data Engineering
Indexing the Solution Space: A New Technique for Nearest Neighbor Search in High-Dimensional Space

IEEE Transactions on Knowledge and Data Engineering
Multiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases

IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Trading Quality for Time with Nearest Neighbor Search

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Active File Systems for Data Mining and Multimedia

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Asymptotically Optimal Declustering Schemes for Range Queries

ICDT '01 Proceedings of the 8th International Conference on Database Theory
On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
A Parallel Similarity Search in High Dimensional Metric Space Using M-Tree

IWCC '01 Proceedings of the NATO Advanced Research Workshop on Advanced Environments, Tools, and Applications for Cluster Computing-Revised Papers
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Improving Adaptable Similarity Query Processing by Using Approximations

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Similarity-Based Operators in Image Database Systems

WAIM '01 Proceedings of the Second International Conference on Advances in Web-Age Information Management
Implementation of Multidimensional Index Structures for Knowledge Discovery in Relational Databases

DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Interactive-Time Similarity Search for Large Image Collections Using Parallel VA-Files

ECDL '00 Proceedings of the 4th European Conference on Research and Advanced Technology for Digital Libraries
A Content-Based Approach to Searching and Indexing Spatial Configurations

GIScience '02 Proceedings of the Second International Conference on Geographic Information Science
3D Shape Histograms for Similarity Search and Classification in Spatial Databases

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Dynamic vp-tree indexing for n-nearest neighbor search given pair-wise distances

The VLDB Journal — The International Journal on Very Large Data Bases
Using Hilbert curve in image storing and retrieving

Information Systems
Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

IEEE Transactions on Knowledge and Data Engineering
CSVD: Clustering and Singular Value Decomposition for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Active network file system for data mining and multimedia

ICCC '02 Proceedings of the 15th international conference on Computer communication
Disk Allocation for Fast Range and Nearest-Neighbor Queries

Distributed and Parallel Databases
Novel indexing method of relations between salient objects

Effective databases for text & document management
An Efficient Technique for Nearest-Neighbor Query Processing on the SPY-TEC

IEEE Transactions on Knowledge and Data Engineering
From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

Journal of the ACM (JACM)
Integrating similarity-based queries in image DBMSs

Proceedings of the 2004 ACM symposium on Applied computing
On efficiently processing nearest neighbor queries in a loosely coupled set of data sources

Proceedings of the 12th annual ACM international workshop on Geographic information systems
Iterative-improvement-based declustering heuristics for multi-disk databases

Information Systems
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal data-space partitioning of spatial data for parallel I/O

Distributed and Parallel Databases
Fast estimation of fractal dimension and correlation integral on stream data

Information Processing Letters
Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Efficient processing of complex similarity queries in RDBMS through query rewriting

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Pattern Recognition
Data space mapping for efficient I/O in large multi-dimensional databases

Information Systems
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
Laplace spectra as fingerprints for image recognition

Computer-Aided Design
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
Efficient Processing of Nearest Neighbor Queries in Parallel Multimedia Databases

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Dimensionality reduction for similarity search with the Euclidean distance in high-dimensional applications

Multimedia Tools and Applications
Optimal K-Nearest-Neighbor Query in Data Grid

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Fast estimation of fractal dimension and correlation integral on stream data

Information Processing Letters
Preface to the 2nd international workshop on unstructured data management (USDM 2011)

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Batch text similarity search with MapReduce

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Minimizing the search space for shape retrieval algorithms

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
An index structure for parallel processing of multidimensional data

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Efficient parallel processing for K-nearest-neighbor search in spatial databases

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Automatic image description based on textual data

Journal on Data Semantics VII
Large-scale similarity-based join processing in multimedia databases

MMM'12 Proceedings of the 18th international conference on Advances in Multimedia Modeling
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
A data allocation method for efficient content-based retrieval in parallel multimedia databases

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Towards a universal tracking database

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Efficient and robust large medical image retrieval in mobile cloud computing environment

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most similarity search techniques map the data objects into some high-dimensional feature space. The similarity search then corresponds to a nearest-neighbor search in the feature space which is computationally very intensive. In this paper, we present a new parallel method for fast nearest-neighbor search in high-dimensional feature spaces. The core problem of designing a parallel nearest-neighbor algorithm is to find an adequate distribution of the data onto the disks. Unfortunately, the known declustering methods to not perform well for high-dimensional nearest-neighbor search. In contrast, our method has been optimized based on the special properties of high-dimensional spaces and therefore provides a near-optimal distribution of the data items among the disks. The basic idea of our data declustering technique is to assign the buckets corresponding to different quadrants of the data space to different disks. We show that our technique - in contrast to other declustering methods - guarantees that all buckets corresponding to neighboring quadrants are assigned to different disks. We evaluate our method using large amounts of real data (up to 40 MBytes) and compare it with the best known data declustering method, the Hilbert curve. Our experiments show that our method provides an almost linear speed-up and a constant scale-up. Additionally, it outperforms the Hilbert approach by a factor of up to 5.