ACM Computing Surveys (CSUR)
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Towards effective indexing for very large video sequence database
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Content-based multimedia information retrieval: State of the art and challenges
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
An adaptive and dynamic dimensionality reduction method for high-dimensional indexing
The VLDB Journal — The International Journal on Very Large Data Bases
On the marriage of Lp-norms and edit distance
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Multi-probe LSH: efficient indexing for high-dimensional similarity search
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Image retrieval: Ideas, influences, and trends of the new age
ACM Computing Surveys (CSUR)
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient EMD-based similarity search in multimedia databases via flexible dimensionality reduction
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximation algorithms for co-clustering
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Locality condensation: a new dimensionality reduction method for image retrieval
MM '08 Proceedings of the 16th ACM international conference on Multimedia
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Quality and efficiency in high dimensional nearest neighbor search
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Similarity search on Bregman divergence: towards non-metric indexing
Proceedings of the VLDB Endowment
IEEE Transactions on Image Processing
Sparse hashing for fast multimedia search
ACM Transactions on Information Systems (TOIS)
Inter-media hashing for large-scale retrieval from heterogeneous data sources
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Effective hashing for large-scale multimedia search
Proceedings of the 2013 Sigmod/PODS Ph.D. symposium on PhD symposium
Data centric research at the University of Queensland
ACM SIGMOD Record
Hi-index | 0.00 |
Multimedia similarity search has been playing a critical role in many novel applications. Typically, multimedia objects are described by high-dimensional feature vectors (or points) which are organized in databases for retrieval. Although many high-dimensional indexing methods have been proposed to facilitate the search process, efficient retrieval over large, sparse and extremely high-dimensional databases remains challenging due to the continuous increases in data size and feature dimensionality. In this paper, we propose the first framework for Data Co-Reduction (DCR) on both data size and feature dimensionality. By utilizing recently developed co-clustering methods, DCR simultaneously reduces both size and dimensionality of the original data into a compact subspace, where lower bounds of the actual distances in the original space can be efficiently established to achieve fast and lossless similarity search in the filter-and refine approach. Particularly, DCR considers the duality between size and dimensionality, and achieves the optimal coreduction which generates the least number of candidates for actual distance computations. We conduct an extensive experimental study on large and real-life multimedia datasets, with dimensionality ranging from 432 to 1936. Our results demonstrate that DCR outperforms existing methods significantly for lossless retrieval, especially in the presence of extremely high dimensionality.