Principles of distributed database systems
Principles of distributed database systems
The Strobe algorithms for multi-source warehouse consistency
DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Efficient Snapshot Differential Algorithms for Data Warehousing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Complexity of answering queries using materialized views
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An object oriented approach to multidimensional database conceptual modeling (OOMD)
Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
Proceedings of the eighth international conference on Information and knowledge management
Representing and querying XML with incomplete information
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Designing data marts for data warehouses
ACM Transactions on Software Engineering and Methodology (TOSEM)
Dealing with Semantic Heterogeneity During Data Integration
ER '99 Proceedings of the 18th International Conference on Conceptual Modeling
Element matching across data-oriented XML sources using a multi-strategy clustering model
Data & Knowledge Engineering
Representing and querying XML with incomplete information
ACM Transactions on Database Systems (TODS)
Provenance in Databases: Why, How, and Where
Foundations and Trends in Databases
Component-based mediation services for the integration of medical applications
Artificial Intelligence in Medicine
Improving multimedia as a service MaaS approach for dynamic multimedia content integration
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.02 |
A data warehouse is a repository of integrated information from distributed, autonomous, and possibly heterogeneous, sources. In effect, the warehouse stores one or more materialized views of the source data. The data is then readily available to user applications for querying and analysis. Figure 1 shows the basic architecture of a warehouse: data is collected from each source, integrated with data from other sources, and stored at the warehouse. Users then access the data directly from the warehouse.As suggested by Figure 1, there are two major components in a warehouse system: the integration component, responsible for collecting and maintaining the materialized views, and the query and analysis component, responsible for fulfilling the information needs of specific end users. Note that the two components are not independent. For example, which views the integration component materializes depends on the expected needs of end users.Most current commercial warehousing systems (e.g., Redbrick, Sybase, Arbor) focus on the query and analysis component, providing specialized index structures at the warehouse and extensive querying facilities for the end user. In the WHIPS (WareHousing Information Project at Stanford) project, on the other hand, we focus on the integration component. In particular, we have developed an architecture and implemented a prototype for identifying data changes at heterogeneous sources, transforming them and summarizing them in accordance to warehouse specifications, and incrementally integrating them into the warehouse. We propose to demonstrate our prototype at SIGMOD, illustrating the main features of our architecture. Our architecture is modular and we designed it specifically to fulfill several important and interrelated goals: data sources and warehouse views can be added and removed dynamically; it is scalable by adding more internal modules; changes at the sources are detected automatically; the warehouse may be updated continuously as the sources change, without requiring “down time;” and the warehouse is always kept consistent with the source data by the integration algorithms. More details on these goals and how we achieve them are provided in [WGL+96].