SimDB: a similarity-aware database system
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Subspace similarity search: efficient k-NN queries in arbitrary subspaces
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
An efficient algorithm for reverse furthest neighbors query with metric index
DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Embedding-based subsequence matching in time-series databases
ACM Transactions on Database Systems (TODS)
Scalable density-based subspace clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
An efficient algorithm for arbitrary reverse furthest neighbor queries
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Probabilistic top-k dominating queries in uncertain databases
Information Sciences: an International Journal
Hi-index | 0.00 |
Similarity search has been widely used in many applications such as information retrieval, image data analysis, and time-series matching. Specifically, a similarity query retrieves all data objects in a data set that are similar to a given query object. Previous work on similarity search usually consider the search problem in the full space. In this paper, however, we propose a novel problem, subspace similarity search, which finds all data objects that match with a query object in the subspace instead of the original full space. In particular, the query object can specify arbitrary subspace with arbitrary number of dimensions. Since traditional approaches for similarity search cannot be applied to solve the proposed problem, we introduce an efficient and effective pruning technique, which assigns scores to data objects with respect to pivots and prunes candidates via scores. We propose an effective multipivot-based method to pre-process data objects by selecting appropriate pivots, where the entire procedure is guided by a formal cost model, such that the pruning power is maximized. Finally, scores of each data object are organized in sorted list to facilitate an efficient subspace similarity search. Extensive experiments have verified the correctness of our cost model and demonstrated the efficiency and effectiveness of our proposed approach for the subspace similarity search.