On-line multi-threaded processing of web user-clicks on multi-core processors

Authors:
Carolina Bonacic;Carlos Garcia;Mauricio Marin;Manuel Prieto;Francisco Tirado
Affiliations:
Depto. Arquitectura de Computadores y Automatica, Universidad Complutense de Madrid;Depto. Arquitectura de Computadores y Automatica, Universidad Complutense de Madrid;Yahoo! Research Latin America, Universidad de Santiago de Chile;Depto. Arquitectura de Computadores y Automatica, Universidad Complutense de Madrid;Depto. Arquitectura de Computadores y Automatica, Universidad Complutense de Madrid
Venue:
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Year:
2010

Citing 20
Cited 0

Inverted File Partitioning Schemes in Multiple Disk Systems

IEEE Transactions on Parallel and Distributed Systems
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Query performance for tightly coupled distributed digital libraries

Proceedings of the third ACM conference on Digital libraries
Searching the Web

ACM Transactions on Internet Technology (TOIT)
Modern Information Retrieval

Modern Information Retrieval
Hybrid Partition Inverted Files: Experimental Validation

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Predictive caching and prefetching of query results in search engines

WWW '03 Proceedings of the 12th international conference on World Wide Web
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
Parallel Search using Partitioned Inverted Files

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Three-level caching for efficient query processing in large Web search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data

ACM Transactions on Information Systems (TOIS)
Inverted files for text search engines

ACM Computing Surveys (CSUR)
A pipelined architecture for distributed text query evaluation

Information Retrieval
High-performance distributed inverted files

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Design trade-offs for search engine caching

ACM Transactions on the Web (TWEB)
A Search Engine Accepting On-Line Updates

Euro-Par '07 Proceedings of the 13th European international conference on Parallel Processing
High-performance priority queues for parallel crawlers

Proceedings of the 10th ACM workshop on Web information and data management
Using graphics processors for high performance IR query processing

Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines

Proceedings of the 18th international conference on World wide web
On caching search engine query results

Computer Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Real time search -- a setting in which Web search engines are able to include among their query results documents published on the Web in the very recent past -- is a clear evidence that many of the off-line computations performed so far on conventional search engines need to be moved to the on-line arena. This is a demanding case for parallel computing since it is necessary to cope efficiently with thousands of concurrent read and write operations per unit time, all requiring latency times within a fraction of a second. To our knowledge, computations related to capturing user preferences through their clicks on the query result webpages and include this feature in the document ranking process are currently performed in an off-line manner. This is effected by pre-processing very large logs containing millions of queries submitted by actual users in a time scale of days, weeks or even months. The outcome is score data for the set of documents indexed by the search engine which were selected by users in the past. This paper studies the efficiency of this process in the on-line setting by evaluating a set of strategies for concurrent read/write operations executed on a multi-threaded multicore architecture. The benefit of efficient on-line processing of user clicks is making it feasible to include user preference in document ranking also in a real-time fashion.