A Fast k Nearest Neighbor Finding Algorithm Based on the Ordered Partition
IEEE Transactions on Pattern Analysis and Machine Intelligence
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A model for the prediction of R-tree performance
PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficiently supporting ad hoc queries in large datasets of time sequences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods
ACM Computing Surveys (CSUR)
Clustering and singular value decomposition for approximate indexing in high dimensional spaces
Proceedings of the seventh international conference on Information and knowledge management
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A cost model for query processing in high dimensional data spaces
ACM Transactions on Database Systems (TODS)
Scalability for clustering algorithms revisited
ACM SIGKDD Explorations Newsletter
Modeling high-dimensional index structures using sampling
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
ACM Computing Surveys (CSUR)
Searching Multimedia Databases by Content
Searching Multimedia Databases by Content
Image Databases: Search and Retrieval of Digital Imagery
Image Databases: Search and Retrieval of Digital Imagery
Fast and Effective Retrieval of Medical Tumor Shapes
IEEE Transactions on Knowledge and Data Engineering
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
IEEE Transactions on Knowledge and Data Engineering
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Exact k-NN queries on clustered SVD datasets
Information Processing Letters
Persistent Semi-Dynamic Ordered Partition Index
The Computer Journal
High-dimensional indexing methods utilizing clustering and dimensionality reduction
High-dimensional indexing methods utilizing clustering and dimensionality reduction
Multimedia Tools and Applications
CPRS: A cloud-based program recommendation system for digital TV platforms
Future Generation Computer Systems
Hi-index | 0.00 |
Content based image retrieval represents images as N- dimensional feature vectors. The k images most similar to a target image, i.e., those closest to its feature vector, are determined by applying a k-nearest-neighbor (k-NN) query. A sequential scan of the feature vectors for k-NN queries is costly for a large number of images when N is high. The search space can be reduced by indexing the data, but the effectiveness of multidimensional indices is poor for high dimensional data. Building indices on dimensionality reduced data is one method to improve indexing efficiency. We utilize the Singular Value Decomposition (SVD) method to attain dimensionality reduction (DR) with minimum information loss for static data. Clustered SVD (CSVD) combines clustering with SVD to attain a lower normalized mean square error (NMSE) by taking advantage of the fact that most real-world datasets exhibit local rather than global correlations. The Local Dimensionality Reduction (LDR) method differs from CSVD in that it uses an SVD-friendly clustering method, rather than the k-means clustering method. We propose a hybrid method which combines the clustering method of LDR with the DR method of CSVD, so that the vector of the number of retained dimensions of the clusters is determined by varying the NMSE. We build SR-tree indices based on the vectors in the clusters to determine the number of accessed pages for exact k-NN queries (Thomasian et al., Inf Process Lett - IPL 94(6):247---252, 2005) (see Section A.3 versus the NMSE. Varying the NMSE a minimum cost can be found, because the lower cost of accessing a smaller index is offset with the higher postprocessing cost resulting from lower retrieval accuracy. Experimenting with one synthetic and three real-world datasets leads to the conclusion that the lowest cost is attained at NMSE驴驴驴0.03 and between 1/3 and 1/2 of the number of dimensions are retained. In one case doubling the number of dimensions cuts the number of accessed pages by one half. The Appendix provides the requisite background information for reading this paper.