Finding data broadness via generalized nearest neighbors

Authors:
Jayendra Venkateswaran;Tamer Kahveci;Orhan Camoglu
Affiliations:
CISE Department, University of Florida, Gainesville, FL;CISE Department, University of Florida, Gainesville, FL;University of California, Santa Barbara, CA
Venue:
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Year:
2006

Citing 19
Cited 1

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An analysis of schedules for performing multi-page requests

Information Systems
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Influence sets based on reverse nearest neighbor queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient Scheduling of Page Access in Index-Based Join Processing

IEEE Transactions on Knowledge and Data Engineering
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
An Index Structure for Efficient Reverse Nearest Neighbor Queries

Proceedings of the 17th International Conference on Data Engineering
Discovery of Influence Sets in Frequently Updated Databases

Proceedings of the 27th International Conference on Very Large Data Bases
Hilbert R-tree: An Improved R-tree using Fractals

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems
Scheduling of page-fetches in join operations

VLDB '81 Proceedings of the seventh international conference on Very Large Data Bases - Volume 7
Reverse kNN search in arbitrary dimensionality

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Gorder: an efficient method for KNN join processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Towards evaluating GRASIM for ontology-based data matching

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem.