The WHIPS prototype for data warehouse creation and maintenance

Authors:
Wilburt J. Labio;Yue Zhuge;Janet L. Wiener;Himanshu Gupta;Héctor García-Molina;Jennifer Widom
Affiliations:
Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA;Department of Computer Science, Stanford University, Stanford, CA
Venue:
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Year:
1997

Citing 3
Cited 11

Principles of distributed database systems

Principles of distributed database systems
The Strobe algorithms for multi-source warehouse consistency

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Complexity of answering queries using materialized views

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An object oriented approach to multidimensional database conceptual modeling (OOMD)

Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
Towards data warehouse design

Proceedings of the eighth international conference on Information and knowledge management
Representing and querying XML with incomplete information

PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Designing data marts for data warehouses

ACM Transactions on Software Engineering and Methodology (TOSEM)
Dealing with Semantic Heterogeneity During Data Integration

ER '99 Proceedings of the 18th International Conference on Conceptual Modeling
Element matching across data-oriented XML sources using a multi-strategy clustering model

Data & Knowledge Engineering
Representing and querying XML with incomplete information

ACM Transactions on Database Systems (TODS)
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
Component-based mediation services for the integration of medical applications

Artificial Intelligence in Medicine
Improving multimedia as a service MaaS approach for dynamic multimedia content integration

International Journal of Metadata, Semantics and Ontologies

Quantified Score

Hi-index	0.02

Visualization

Abstract

A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse.As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users.Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96].