Emerging Cubes: Borders, size estimations and lossless reductions

  • Authors:
  • Sébastien Nedjar;Alain Casali;Rosine Cicchetti;Lotfi Lakhal

  • Affiliations:
  • Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France

  • Venue:
  • Information Systems
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovering trend reversals between two data cubes provides users with a novel and interesting knowledge when the real world context fluctuates: What is new? Which trends appear or emerge? Which tendencies are immersing or disappear? With the concept of Emerging Cube, we capture such trend reversals by enforcing an emergence constraint. We resume the classical borders for the Emerging Cube and introduce a new one which optimizes both storage space and computation time, provides a simple characterization of the size of Emerging Cubes, as well as classification and cube navigation tools. We soundly state the connection between the classical and proposed borders by using cube transversals. Knowing the size of Emerging Cubes without computing them is of great interest in particular for adjusting at best the underlying emergence constraint. We address this issue by studying an upper bound and characterizing the exact size of Emerging Cubes. We propose two strategies for quickly estimate their size: one based on analytical estimation, without database access, and one based on probabilistic counting using the proposed borders as the input of the near-optimal algorithm HyperLogLog. Due to the efficiency of the estimation algorithm various iterations can be performed to calibrate at best the emergence constraint. Moreover, we propose reduced and lossless representations of the Emerging Cube by using the concept of cube closure. Finally, we perform experiments for different data distributions in order to measure on one hand the size of the introduced condensed and concise representations and on the other hand the performance (accuracy and computation time) of the proposed estimation method.