An Incremental Threshold Method for Continuous Text Search Queries

Authors:
Kyriakos Mouratidis;HweeHwa Pang
Affiliations:
-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 7

The gist of everything new: personalized top-k processing over web 2.0 streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Distributed processing of continuous sliding-window k-NN queries for data stream filtering

World Wide Web
Efficient monitoring of personalized hot news over Web 2.0 streams

Computer Science - Research and Development
Distributed top-k full-text content dissemination

Distributed and Parallel Databases
Processing continuous text queries featuring non-homogeneous scoring functions

Proceedings of the 21st ACM international conference on Information and knowledge management
Top-k/w publish/subscribe: A publish/subscribe model for continuous top-k processing over data streams

Information Systems
Evaluating continuous top-k queries over document streams

World Wide Web

Quantified Score

Hi-index	0.01

Visualization

Abstract

A text filtering system monitors a stream of incoming documents, to identify those that match the interest profiles of its users. The user interests are registered at a server as continuous text search queries. The server constantly maintains for each query a ranked result list, comprising the recent documents (drawn from a sliding window) with the highest similarity to the query. Such a system underlies many text monitoring applications that need to cope with heavy document traffic, such as news and email monitoring. In this paper, we propose the first solution for processing continuous text queries efficiently. Our objective is to support a large number of user queries while sustaining high document arrival rates. Our solution indexes the streamed documents with a structure based on the principles of the inverted file, and processes document arrival and expiration events with an incremental threshold-based method. Using a stream of real documents, we experimentally verify the efficiency of our approach, which is at least an order of magnitude faster than a competitor constructed from existing techniques.