Similarity Search in Arbitrary Subspaces Under Lp-Norm

  • Authors:
  • Xiang Lian;Lei Chen

  • Affiliations:
  • Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China. xlian@cse.ust.hk;Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China. leichen@cse.ust.hk

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search has been widely used in many applications such as information retrieval, image data analysis, and time-series matching. Specifically, a similarity query retrieves all data objects in a data set that are similar to a given query object. Previous work on similarity search usually consider the search problem in the full space. In this paper, however, we propose a novel problem, subspace similarity search, which finds all data objects that match with a query object in the subspace instead of the original full space. In particular, the query object can specify arbitrary subspace with arbitrary number of dimensions. Since traditional approaches for similarity search cannot be applied to solve the proposed problem, we introduce an efficient and effective pruning technique, which assigns scores to data objects with respect to pivots and prunes candidates via scores. We propose an effective multipivot-based method to pre-process data objects by selecting appropriate pivots, where the entire procedure is guided by a formal cost model, such that the pruning power is maximized. Finally, scores of each data object are organized in sorted list to facilitate an efficient subspace similarity search. Extensive experiments have verified the correctness of our cost model and demonstrated the efficiency and effectiveness of our proposed approach for the subspace similarity search.