The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

Authors:
Christian Böhm;Bernhard Braunmüller;Hans-Peter Kriegel
Affiliations:
-;-;-
Venue:
DaWaK 2000 Proceedings of the Second International Conference on Data Warehousing and Knowledge Discovery
Year:
2000

Citing 15
Cited 1

The design and analysis of spatial data structures

The design and analysis of spatial data structures
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
S3: similarity search in CAD database systems

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
From data mining to knowledge discovery: an overview

Advances in knowledge discovery and data mining
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multidimensional access methods

ACM Computing Surveys (CSUR)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Machine Learning

Machine Learning
Finding Aggregate Proximity Relationships and Commonalities in Spatial Data Mining

IEEE Transactions on Knowledge and Data Engineering
OPTICS-OF: Identifying Local Outliers

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Towards Location-Based Real-Time Monitoring Systems in u-LBS

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.