MTopS: scalable processing of continuous top-k multi-query workloads

  • Authors:
  • Avani Shastri;Yang Di;Elke A. Rundensteiner;Matthew O. Ward

  • Affiliations:
  • Worcester Polytechnic Institute, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA;Ins, Worcester, MA, USA;Worcester Polytechnic Institute, Worcester, MA, USA

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A continuous top-k query retrieves the k most preferred objects in a data stream according to a given preference function. These queries are important for a broad spectrum of applications ranging from web-based advertising to financial analysis. In various streaming applications, a large number of such continuous top-k queries need to be executed simultaneously against a common popular input stream. To efficiently handle such top-k query workload, we present a comprehensive framework, called MTopS.Within this MTopS framework, several computational components work collaboratively to first analyze the commonalities across the workload; organize the workload for maximized sharing opportunities; execute the workload queries simultaneously in a shared manner; and output query results whenever any input query requires. In particular, MTopS supports two proposed algorithms, MTopBand and MTopList, which both incrementally maintain the top-k objects over time for multiple queries. As the foundation, we first identify the minimal object set from the data stream that is both necessary and sufficient for accurately answering all top-k queries in the workload. Then, the MTopBand algorithm is presented to incrementally maintain such minimum object set and eliminate the need for any recomputation from scratch. To further optimize MTop-Band, we design the second algorithm, MTopList which organizes the progressive top-k results of workload queries in a compact structure. MTopList is shown to be memory optimal and also more efficient in terms of CPU time usage than MTopBand. Our experimental study, using real data streams from domains of stock trades and moving object monitoring, demonstrates that both the efficiency and scalability of our proposed techniques are clearly superior to the state-of-the-art solutions.