Similarity Search in Arbitrary Subspaces Under Lp-Norm

Authors:
Xiang Lian;Lei Chen
Affiliations:
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China. xlian@cse.ust.hk;Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China. leichen@cse.ust.hk
Venue:
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Year:
2008

Citing 0
Cited 9

SimDB: a similarity-aware database system

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
An efficient algorithm for reverse furthest neighbors query with metric index

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
An efficient algorithm for arbitrary reverse furthest neighbor queries

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Probabilistic top-k dominating queries in uncertain databases

Information Sciences: an International Journal
Efficient processing of probabilistic group subspace skyline queries in uncertain databases

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similarity search has been widely used in many applications such as information retrieval, image data analysis, and time-series matching. Specifically, a similarity query retrieves all data objects in a data set that are similar to a given query object. Previous work on similarity search usually consider the search problem in the full space. In this paper, however, we propose a novel problem, subspace similarity search, which finds all data objects that match with a query object in the subspace instead of the original full space. In particular, the query object can specify arbitrary subspace with arbitrary number of dimensions. Since traditional approaches for similarity search cannot be applied to solve the proposed problem, we introduce an efficient and effective pruning technique, which assigns scores to data objects with respect to pivots and prunes candidates via scores. We propose an effective multipivot-based method to pre-process data objects by selecting appropriate pivots, where the entire procedure is guided by a formal cost model, such that the pruning power is maximized. Finally, scores of each data object are organized in sorted list to facilitate an efficient subspace similarity search. Extensive experiments have verified the correctness of our cost model and demonstrated the efficiency and effectiveness of our proposed approach for the subspace similarity search.