ISIS: a new approach for efficient similarity search in sparse databases

Authors:
Bin Cui;Jiakui Zhao;Gao Cong
Affiliations:
Department of Computer Science S Key Laboratory of High Confidence Software Technologies (Ministry of Education), Peking University;China Electric Power Research Institute, China;Aalborg University, Denmark
Venue:
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part II
Year:
2010

Citing 13
Cited 0

Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Clustering for Approximate Similarity Search in High-Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Storage and Querying of E-Commerce Data

Proceedings of the 27th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing

Proceedings of the 27th International Conference on Very Large Data Bases
Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches

The VLDB Journal — The International Journal on Very Large Data Bases
Contorting high dimensional data for efficient main memory KNN processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Nearest Neighbor Retrieval Using Distance-Based Hashing

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-dimensional sparse data is prevalent in many real-life applications. In this paper, we propose a novel index structure for accelerating similarity search in high-dimensional sparse databases, named ISIS, which stands for Indexing Sparse databases using Inverted fileS. ISIS clusters a dataset and converts the original high-dimensional space into a new space where each dimension represents a cluster; furthermore, the key values in the new space are used by Inverted-files indexes. We also propose an extension of ISIS, named ISIS+, which partitions the data space into lower dimensional subspaces and clusters the data within each subspace. Extensive experimental study demonstrates the superiority of our approaches in high-dimensional sparse databases.