Efficient monitoring of skyline queries over distributed data streams

  • Authors:
  • Shengli Sun;Zhenghua Huang;Hao Zhong;Dongbo Dai;Hongbin Liu;Jinjiu Li

  • Affiliations:
  • Peking University, School of Software and Microelectronics, Beijing, China;Tongji University, Department of Computer Science, Shanghai, China;Laboratory for Internet Software Technology, Institute of Software, Chinese Academy of Sciences, Beijing, China;Fudan University, School of Computer Science and Technology, Shanghai, China;State Grid Corporation of China, North China Grid of China, Beijing, China;University of Technology, Faculty of Engineering and Information Technology, Sydney, NSW, Australia

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data management and data mining over distributed data streams have received considerable attention within the database community recently. This paper is the first work to address skyline queries over distributed data streams, where streams derive from multiple horizontally split data sources. Skyline query returns a set of interesting objects which are not dominated by any other objects within the base dataset. Previous work is concentrated on skyline computations over static data or centralized data streams. We present an efficient and an effective algorithm called BOCS to handle this issue under a more challenging environment of distributed streams. BOCS consists of an efficient centralized algorithm GridSky and an associated communication protocol. Based on the strategy of progressive refinement in BOCS, the skyline is incrementally computed by two phases. In the first phase, local skylines on remote sites are maintained by GridSky. At each time, only skyline increments on remote sites are sent to the coordinator. In the second phase, a global skyline is obtained by integrating remote increments with the latest global skyline. A theoretical analysis shows that BOCS is communication-optimal among all algorithms which use a share-nothing strategy. Extensive experiments demonstrate that our proposals are efficient, scalable, and stable.