A query language and optimization techniques for unstructured data
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A methodological framework for data warehouse design
Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
Conceptual modeling for ETL processes
Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Views in a Large Scale XML Repository
Proceedings of the 27th International Conference on Very Large Data Bases
An Extensible Framework for Data Cleaning
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Data Transformation for Warehousing Web Data
WECWIS '01 Proceedings of the Third International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems (WECWIS '01)
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
RAMEPs: a goal-ontology approach to analyse the requirements for data warehouse systems
WSEAS Transactions on Information Science and Applications
Hi-index | 0.00 |
In Data Warehousing, Extraction-Transformation-Loading (ETL) are the key tasks that are responsible for the extraction of data from several sources, their cleansing, customization and insertion into data warehouse [10]. More specifically ETL tools are category of specialized tools with the task of dealing with data warehouse cleaning and loading problems. These task are very critical in every data warehouse environment, It is observed that ETL and data cleaning tools are estimated to cost at least one third of effort and expenses in the budget of the data warehouse [1,11], another evidence shows that ETL process costs 55% of the total cost of the data warehouse [1,12]. In this paper, we focus on the problem of the definition of ETL processes using xml in order to make this framework more generic and capable to deal with heterogeneous source systems. We described the framework that extract data from various heterogeneous source systems and carry it in xml files, later on data cleaning is performed using few predefined xml templates, predefined functions and ultimately data is loaded into data warehouse as per warehouse schema.