Two-phase data warehouse optimized for data mining

  • Authors:
  • Balázs Rácz;Csaba István Sidló;András Lukács;András A. Benczúr

  • Affiliations:
  • Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Informatics Laboratory, Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary

  • Venue:
  • BIRTE'06 Proceedings of the 1st international conference on Business intelligence for the real-time enterprises
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new, heterogeneous data warehouse architecture where a first phase traditional relational OLAP warehouse coexist with a second phase data in compressed form optimized for data mining. Aggregations and metadata for the entire time frame are stored in the first phase relational database. The main advantage of the second phase is its reduced I/O requirement that enables very high throughput processing by sequential read-only data stream algorithms. It becomes feasible to run speed optimized queries and data mining operations on the entire time frame of most granular data. The second phase also enables long term data storage and analysis using a very efficient compressed format at low storage costs even for historical data. The proposed architecture fits existing data warehouse solutions. We show the effectiveness of the two-phase data warehouse through a case study of a large web portal.