Declarative XML data cleaning with XClean

Authors:
Melanie Weis;Ioana Manolescu
Affiliations:
HPI für Softwaresystemtechnik GmbH, Potsdam;INRIA Futurs, Orsay Cedex, France
Venue:
CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
Year:
2007

Citing 17
Cited 5

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Declarative Data Cleaning: Language, Model, and Algorithms

Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Query Optimization in the Presence of Foreign Functions

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Object Fusion in Mediator Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
XML Query Processing

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ConQuer: efficient management of inconsistent databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Extending XQuery for analytics

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data cleaning in microsoft SQL server 2005

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Automatic data fusion with HumMer

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Primitive Operator for Similarity Joins in Data Cleaning

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Detecting Duplicates in Complex XML Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Object identification with attribute-mediated dependences

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Declarative data fusion – syntax, semantics, and implementation

ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems

Probabilistic Entity Linkage for Heterogeneous Information Spaces

CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
Incorporating Domain-Specific Information Quality Constraints into Database Queries

Journal of Data and Information Quality (JDIQ)
Integration and knowledge reuse environment for producing award winning solutions for binary decision data mining problems

IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
XML data fusion

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Integrating open government data with stratosphere for more transparency

Web Semantics: Science, Services and Agents on the World Wide Web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a crucial task in customer relationship management, data mining, and data integration.With the growing amount of XML data, approaches to effectively and efficiently clean XML are needed, an issue not addressed by existing data cleaning systems that mostly specialize on relational data. We present XClean, a data cleaning framework specifically geared towards cleaning XML data. XClean's approach is based on a set of cleaning operators, whose semantics is well-defined in terms of XML algebraic operators. Users may specify cleaning programs by combining operators by means of a declarative XClean/PL program, which is then compiled into XQuery. We describe XClean's operators, language, and compilation approach, and validate its effectiveness through a series of case studies.