Dynamic Data Warehousing (abstract)

Authors:
Umeshwar Dayal;Qiming Chen;Meichun Hsu
Affiliations:
-;-;-
Venue:
DaWaK '99 Proceedings of the First International Conference on Data Warehousing and Knowledge Discovery
Year:
1999

Citing 0
Cited 1

User Defined Partitioning - Group Data Based on Computation Model

DaWaK '08 Proceedings of the 10th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data warehouses and on-line analytical processing (OLAP) tools have become essential elements of decision support systems. Traditionally, data warehouses are refreshed periodically (for example, nightly) by extracting, transforming, cleaning and consolidating data from several operational data sources. The data in the warehouse is then used to periodically generate reports, or to rebuild multidimensional (data cube) views of the data for on-line querying and analysis. Increasingly, however, we are seeing business intelligence applications in telecommunications, electronic commerce, and other industries, that are characterized by very high data volumes and data flow rates, and that require continuous analysis and mining of the data. For such applications, rather different data warehousing and on-line analysis architectures are required. In this paper, we first motivate the need for a new architecture by summarizing the requirements of these applications. Then, we describe a few approaches that are being developed, including virtual data warehouses or enterprise portals that support access through views or links directly to the operational data sources. We discuss the relative merits of these approaches. We then focus on a dynamic data warehousing and OLAP architecture that we have developed and prototyped at HP Labs. In this architecture, data flows continuously into a data warehouse, and is staged into one or more OLAP tools that are used as computation engines to continuously and incrementally build summary data cubes, which might then be stored back in the data warehouse. Analysis and data mining functions are performed continuously and incrementally over these summary cubes. Retirement policies define when to discard data from the warehouse (i.e., move data from the warehouse into off-line archival storage). Data at different levels of aggregation may have different life spans depending on how they are to be used for downstream analysis and data mining. The key features of the architecture are the following: incremental data reduction using OLAP engines to generate summaries and enable data mining; staging large volumes and flow rates of data with different life spans at different levels of aggregation; and scheduling operations on data depending on the type of processing to be performed and the age of the data.