Maintaining the Dominant Representatives on Data Streams

  • Authors:
  • Wenlin He;Cuiping Li;Hong Chen

  • Affiliations:
  • Key Labs of Data and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Labs of Data and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China;Key Labs of Data and Knowledge Engineering, Ministry of Education, China and School of Information, Renmin University of China, Beijing, China

  • Venue:
  • DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

It is well known that traditional skyline query is very likely to return over many but less informative data points in the result, especially when the querying dataset is high-dimensional or anti-correlated. In data stream applications where large amounts of data are continuously generated, this problem becomes much more serious since the full skyline result cannot be obtained efficiently and analyzed easily. To cope with this difficulty, in this paper, we propose a new concept called Combinatorial Dominant relationship to abstract dominant representatives of stream data. Based on this concept, we propose three novel skyline queries, namely basic convex skyline query (BCSQ) , dynamic convex skyline query (DCSQ) , andreverse convex skyline query (RCSQ) , combining the concepts of convex derived from geometry and the traditional skyline for the first time. These queries can adaptively abstract the contour of skyline points without specifying the size of result set in advance and promote information content of the query result. To efficiently process these queries and maintain their results, we design and analyze algorithms by exploiting a memory indexing structure called DCEL which is used to represent and store the arrangement of data in the sliding window. We convert the problems of points in the primal plane into those of lines in dual plane through dual transformation, which helps us avoid expensive full skyline computation and speeds up the candidate set selection. Finally, through extensive experiments with both real and synthetic datasets, we validate the representative capability of CSQs, as well as the performance of our proposed algorithms.