Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Clustering for Approximate Similarity Search in High-Dimensional Spaces
IEEE Transactions on Knowledge and Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Storage and Querying of E-Commerce Data
Proceedings of the 27th International Conference on Very Large Data Bases
Indexing the Distance: An Efficient Method to KNN Processing
Proceedings of the 27th International Conference on Very Large Data Bases
Indexing very high-dimensional sparse and quasi-sparse vectors for similarity searches
The VLDB Journal — The International Journal on Very Large Data Bases
Contorting high dimensional data for efficient main memory KNN processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Nearest Neighbor Retrieval Using Distance-Based Hashing
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Hi-index | 0.00 |
High-dimensional sparse data is prevalent in many real-life applications. In this paper, we propose a novel index structure for accelerating similarity search in high-dimensional sparse databases, named ISIS, which stands for Indexing Sparse databases using Inverted fileS. ISIS clusters a dataset and converts the original high-dimensional space into a new space where each dimension represents a cluster; furthermore, the key values in the new space are used by Inverted-files indexes. We also propose an extension of ISIS, named ISIS+, which partitions the data space into lower dimensional subspaces and clusters the data within each subspace. Extensive experimental study demonstrates the superiority of our approaches in high-dimensional sparse databases.