Optimal K-Nearest-Neighbor Query in Data Grid

Authors:
Yi Zhuang;Hua Hu;Xiaojun Li;Bin Xu;Haiyang Hu
Affiliations:
College of Computer & Information Engineering, Zhejiang Gongshang University, P.R. China and Zhejiang Provincial Key Laboratory of Information Network Technology, P.R. China;College of Computer & Information Engineering, Zhejiang Gongshang University, P.R. China and Hangzhou Dianzi University, P.R. China;College of Computer & Information Engineering, Zhejiang Gongshang University, P.R. China;College of Computer & Information Engineering, Zhejiang Gongshang University, P.R. China;College of Computer & Information Engineering, Zhejiang Gongshang University, P.R. China
Venue:
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Year:
2009

Citing 5
Cited 0

Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Speeding Up Similarity Queries over Large Chinese Calligraphic Character Databases Using Data Grid

GCC '07 Proceedings of the Sixth International Conference on Grid and Cooperative Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper proposes an optimal distributed k Nearest Neighbor query processing algorithm based on Data Grid, called the opGk NN. Three steps are incorporated in the opGk NN. First when a user submits a query with a vector Vq and a number k, an iDistance[3]-based vector set reduction is first conducted at data node level in parallel. Then the candidate vectors are transferred to the executing nodes for the refinement process in which the answer set is obtained. Finally, the answer set is transferred to the query node. The experimental results show that the performance of the algorithm is efficient and effective in minimizing the response time by decreasing network transfer cost and increasing the parallelism of I/O and CPU.