Content integration for e-business
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Lineage Tracing for General Data Warehouse Transformations
Proceedings of the 27th International Conference on Very Large Data Bases
On the Logical Modeling of ETL Processes
CAiSE '02 Proceedings of the 14th International Conference on Advanced Information Systems Engineering
Lineage tracing for general data warehouse transformations
The VLDB Journal — The International Journal on Very Large Data Bases
A generic and customizable framework for the design of ETL scenarios
Information Systems - Special issue: The 15th international conference on advanced information systems engineering (CAiSE 2003)
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A generic and customizable framework for the design of ETL scenarios
Information Systems - Special issue: The 15th international conference on advanced information systems engineering (CAiSE 2003)
Hi-index | 0.00 |
Cleaning organizational data of discrepancies in structure and content is important for data warehousing and Enterprise Data Integration (EDI). Current commercial solutions for data cleaning involve many iterations of time-consuming "data quality" analysis to find errors, and long-running transformations to fix them. Users need to endure long waits and often write complex transformation programs. We present an interactive framework for data cleaning that tightly integrates transformation and discrepancy detection. Users gradually build transformations by adding or undoing transforms, in a intu-itive, graphical manner through a spreadsheet-like interface; the effect of a transformis shown at once on records visible on screen. In the background, the system incrementally searches for discrepancies on the latest transformed version of data, flagging them as they are found. This allows users to gradually construct a transformation as discrepancies are found, and clean the data without writing complex programs or enduring long delays. Balancing the goals of power, ease of specification, and interactive application, we choose a set of transforms that can be used for transformations within data records as well as for higher-order transformations. We also present initial work on optimizing a sequence of transforms.