Evaluating continuous top-k queries over document streams

  • Authors:
  • Weixiong Rao;Lei Chen;Shudong Chen;Sasu Tarkoma

  • Affiliations:
  • Computer Science & Engineering Department, Hong Kong University of Science and Technology, Kowloon, China;Computer Science & Engineering Department, Hong Kong University of Science and Technology, Kowloon, China;Institute of Microelectronics of Chinese, Academy of Sciences, Beijing, China and China R&D Center for Internet of Things, Wuxi, China;Department of Computer Science, University of Helsinki, Helsinki, Finland

  • Venue:
  • World Wide Web
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

At the age of Web 2.0, Web content becomes live, and users would like to automatically receive content of interest. Popular RSS subscription approach cannot offer fine-grained filtering approach. In this paper, we propose a personalized subscription approach over the live Web content. The document is represented by pairs of terms and weights. Meanwhile, each user defines a top-k continuous query. Based on an aggregation function to measure the relevance between a document and a query, the user continuously receives the top-k most relevant documents inside a sliding window. The challenge of the above subscription approach is the high processing cost, especially when the number of queries is very large. Our basic idea is to share evaluation results among queries. Based on the defined covering relationship of queries, we identify the relations of aggregation scores of such queries and develop a graph indexing structure (GIS) to maintain the queries. Next, based on the GIS, we propose a document evaluation algorithm to share query results among queries. After that, we re-use evaluation history documents, and design a document indexing structure (DIS) to maintain the history documents. Finally, we adopt a cost model-based approach to unify the approaches of using GIS and DIS. The experimental results show that our solution outperforms the previous works using the classic inverted list structure.