Evaluating continuous top-k queries over document streams

Authors:
Weixiong Rao;Lei Chen;Shudong Chen;Sasu Tarkoma
Affiliations:
Computer Science & Engineering Department, Hong Kong University of Science and Technology, Kowloon, China;Computer Science & Engineering Department, Hong Kong University of Science and Technology, Kowloon, China;Institute of Microelectronics of Chinese, Academy of Sciences, Beijing, China and China R&D Center for Internet of Things, Wuxi, China;Department of Computer Science, University of Helsinki, Helsinki, Finland
Venue:
World Wide Web
Year:
2014

Citing 22
Cited 0

Document filtering with inference networks

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
The SIFT information dissemination system

ACM Transactions on Database Systems (TODS)
Filtering algorithms and implementation for very fast publish/subscribe systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Text-Based Content Search and Retrieval in Ad-hoc P2P Communities

Revised Papers from the NETWORKING 2002 Workshops on Web Engineering and Peer-to-Peer Computing
Publish/subscribe functionality in IR environments using structured overlay networks

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
SUBSKY: Efficient Computation of Skylines in Subspaces

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Continuous monitoring of top-k queries over sliding windows

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Hybrid global-local indexing for effcient peer-to-peer information retrieval

NSDI'04 Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1
Optimization of continuous queries with shared expensive filters

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Ad-hoc top-k query answering for data streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Value-based notification conditions in large-scale publish/subscribe systems?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Near-optimal algorithms for shared filter evaluation in data stream systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Information filtering and query indexing for an information retrieval model

ACM Transactions on Information Systems (TOIS)
STAIRS: Towards Efficient Full-Text Filtering and Dissemination in a DHT Environment

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
An Incremental Threshold Method for Continuous Text Search Queries

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
The gist of everything new: personalized top-k processing over web 2.0 streams

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Cobra: contentbased filtering and aggregation of blogs and RSS feeds

NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
A distributed full-text top-k document dissemination system in distributed hash tables

World Wide Web
Efficient Evaluation of Continuous Text Search Queries

IEEE Transactions on Knowledge and Data Engineering
STAIRS: Towards efficient full-text filtering and dissemination in DHT environments

The VLDB Journal — The International Journal on Very Large Data Bases
Distributed top-k full-text content dissemination

Distributed and Parallel Databases
MOVE: A Large Scale Keyword-Based Content Filtering and Dissemination System

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

At the age of Web 2.0, Web content becomes live, and users would like to automatically receive content of interest. Popular RSS subscription approach cannot offer fine-grained filtering approach. In this paper, we propose a personalized subscription approach over the live Web content. The document is represented by pairs of terms and weights. Meanwhile, each user defines a top-k continuous query. Based on an aggregation function to measure the relevance between a document and a query, the user continuously receives the top-k most relevant documents inside a sliding window. The challenge of the above subscription approach is the high processing cost, especially when the number of queries is very large. Our basic idea is to share evaluation results among queries. Based on the defined covering relationship of queries, we identify the relations of aggregation scores of such queries and develop a graph indexing structure (GIS) to maintain the queries. Next, based on the GIS, we propose a document evaluation algorithm to share query results among queries. After that, we re-use evaluation history documents, and design a document indexing structure (DIS) to maintain the history documents. Finally, we adopt a cost model-based approach to unify the approaches of using GIS and DIS. The experimental results show that our solution outperforms the previous works using the classic inverted list structure.