Active caching for similarity queries based on shared-neighbor information

Authors:
Michael E. Houle;Vincent Oria;Umar Qasim
Affiliations:
National Institute of Informatics, Tokyo, Japan;New Jersey Institute of Technology, Newark, NJ, USA;New Jersey Institute of Technology, Newark, NJ, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 17
Cited 2

IBM's 360 and early 370 systems

IBM's 360 and early 370 systems
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Rank-preserving two-level caching for scalable search engines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Semantic query caching in a mobile environment

ACM SIGMOBILE Mobile Computing and Communications Review
Semantic Data Caching and Replacement

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Semantic caching of Web queries

The VLDB Journal — The International Journal on Very Large Data Bases
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
The Amsterdam Library of Object Images

International Journal of Computer Vision
On semantic caching and query scheduling for mobile nearest-neighbor search

Wireless Networks - Special issue: Pervasive computing and communications
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
The SMART Retrieval System—Experiments in Automatic Document Processing

The SMART Retrieval System—Experiments in Automatic Document Processing
Exploring the bounds of web latency reduction from caching and prefetching

USITS'97 Proceedings of the USENIX Symposium on Internet Technologies and Systems on USENIX Symposium on Internet Technologies and Systems
Clustering Using a Similarity Measure Based on Shared Near Neighbors

IEEE Transactions on Computers
Form-based proxy caching for database-backed web sites: keywords and functions

The VLDB Journal — The International Journal on Very Large Data Bases
Performance of compressed inverted list caching in search engines

Proceedings of the 17th international conference on World Wide Web
The Relevant-Set Correlation Model for Data Clustering

Statistical Analysis and Data Mining

Coupled nominal similarity in unsupervised learning

Proceedings of the 20th ACM international conference on Information and knowledge management
Cache-Based Query Processing for Search Engines

ACM Transactions on the Web (TWEB)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Novel applications such as recommender systems, uncertain databases, and multimedia databases are designed to process similarity queries that produce ranked lists of objects as their results. Similarity queries typically result in disk access latency and incur a substantial computational cost. In this paper, we propose an 'active caching' technique for similarity queries that is capable of synthesizing query results from cached information even when the required result list is not explicitly stored in the cache. Our solution, the Cache Estimated Significance (CES) model, is based on shared-neighbor similarity measures, which assess the strength of the relationship between two objects as a function of the number of other objects in the common intersection of their neighborhoods. The proposed method is general in that it does not require that the features be drawn from a metric space, nor does it require that the partial orders induced by the similarity measure be monotonic. Experimental results on real data sets show a substantial cache hit rate when compared with traditional caching approaches.