Emerging Cubes: Borders, size estimations and lossless reductions

Authors:
Sébastien Nedjar;Alain Casali;Rosine Cicchetti;Lotfi Lakhal
Affiliations:
Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France;Laboratoire d'Informatique Fondamentale de Marseille (LIF), Aix-Marseille Université - CNRS Case 901, 163 Avenue de Luminy, 13288 Marseille Cedex 9, France
Venue:
Information Systems
Year:
2009

Citing 24
Cited 6

Identifying the Minimal Transversals of a Hypergraph and Related Problems

SIAM Journal on Computing
Data mining, hypergraph transversals, and machine learning (extended abstract)

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient mining of association rules using closed itemset lattices

Information Systems
Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
OLAP dimension constraints

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Levelwise Search and Borders of Theories in KnowledgeDiscovery

Data Mining and Knowledge Discovery
Computing iceberg concept lattices with TITANIC

Data & Knowledge Engineering
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Extracting semantics from data cubes using cube transversals and closures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Pushing Convertible Constraints in Frequent Itemset Mining

Data Mining and Knowledge Discovery
On Closed Constrained Frequent Pattern Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Mining border descriptions of emerging patterns from dataset pairs

Knowledge and Information Systems
A Thorough Experimental Study of Datasets for Frequent Itemsets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Distributed and Parallel Databases
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
CURE for cubes: cubing using a ROLAP engine

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Towards multidimensional subspace skyline analysis

ACM Transactions on Database Systems (TODS)
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Frequent Closed Sequence Mining without Candidate Maintenance

IEEE Transactions on Knowledge and Data Engineering
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The levelwise version space algorithm and its application to molecular fragment finding

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
Emerging cubes for trends analysis in OLAP databases

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Extracting semantics in OLAP databases using emerging cubes

Information Sciences: an International Journal
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Information Sciences: an International Journal
The agree concept lattice for multidimensional database analysis

ICFCA'11 Proceedings of the 9th international conference on Formal concept analysis
A parallel algorithm for computing borders

Proceedings of the 20th ACM international conference on Information and knowledge management
Constrained closed datacubes

ICFCA'10 Proceedings of the 8th international conference on Formal Concept Analysis
Constrained Cube Lattices for Multidimensional Database Mining

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Discovering trend reversals between two data cubes provides users with a novel and interesting knowledge when the real world context fluctuates: What is new? Which trends appear or emerge? Which tendencies are immersing or disappear? With the concept of Emerging Cube, we capture such trend reversals by enforcing an emergence constraint. We resume the classical borders for the Emerging Cube and introduce a new one which optimizes both storage space and computation time, provides a simple characterization of the size of Emerging Cubes, as well as classification and cube navigation tools. We soundly state the connection between the classical and proposed borders by using cube transversals. Knowing the size of Emerging Cubes without computing them is of great interest in particular for adjusting at best the underlying emergence constraint. We address this issue by studying an upper bound and characterizing the exact size of Emerging Cubes. We propose two strategies for quickly estimate their size: one based on analytical estimation, without database access, and one based on probabilistic counting using the proposed borders as the input of the near-optimal algorithm HyperLogLog. Due to the efficiency of the estimation algorithm various iterations can be performed to calibrate at best the emergence constraint. Moreover, we propose reduced and lossless representations of the Emerging Cube by using the concept of cube closure. Finally, we perform experiments for different data distributions in order to measure on one hand the size of the introduced condensed and concise representations and on the other hand the performance (accuracy and computation time) of the proposed estimation method.