Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Exploiting generative models in discriminative classifiers
Proceedings of the 1998 conference on Advances in neural information processing systems II
A new approach to cross-modal multimedia retrieval
Proceedings of the international conference on Multimedia
Product Quantization for Nearest Neighbor Search
IEEE Transactions on Pattern Analysis and Machine Intelligence
MLRank: Multi-correlation Learning to Rank for image annotation
Pattern Recognition
Hi-index | 0.01 |
In this paper, we propose a new, efficient retrieval system for large-scale multi-modal data including video tracks. With large-scale multi-modal data, the huge data size and various contents cause degradation of efficiency and precision of retrieval results. Recent research on image annotation and retrieval shows that image features based on the Bag-of-Visual Words approach with local descriptors such as SIFT perform surprisingly well with large-scale image datasets. Those powerful descriptors tend to be high-dimensional, imposing a high computational cost for approximate nearest neighbor searching in raw feature space. Our video retrieval method is focused on the correlation between image, sound, and location information recorded simultaneously, and to learn conceptual space describing the contents of the data to realize efficient searching. Experiments show good performance of our retrieval system with low memory usage and temporal complexity.