The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Query Optimization in the Presence of Foreign Functions
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Object Fusion in Mediator Systems
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ConQuer: efficient management of inconsistent databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Extending XQuery for analytics
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data cleaning in microsoft SQL server 2005
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Automatic data fusion with HumMer
VLDB '05 Proceedings of the 31st international conference on Very large data bases
A Primitive Operator for Similarity Joins in Data Cleaning
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Detecting Duplicates in Complex XML Data
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Object identification with attribute-mediated dependences
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Declarative data fusion – syntax, semantics, and implementation
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Probabilistic Entity Linkage for Heterogeneous Information Spaces
CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
Incorporating Domain-Specific Information Quality Constraints into Database Queries
Journal of Data and Information Quality (JDIQ)
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Integrating open government data with stratosphere for more transparency
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
Data cleaning is the process of correcting anomalies in a data source, that may for instance be due to typographical errors, or duplicate representations of an entity. It is a crucial task in customer relationship management, data mining, and data integration.With the growing amount of XML data, approaches to effectively and efficiently clean XML are needed, an issue not addressed by existing data cleaning systems that mostly specialize on relational data. We present XClean, a data cleaning framework specifically geared towards cleaning XML data. XClean's approach is based on a set of cleaning operators, whose semantics is well-defined in terms of XML algebraic operators. Users may specify cleaning programs by combining operators by means of a declarative XClean/PL program, which is then compiled into XQuery. We describe XClean's operators, language, and compilation approach, and validate its effectiveness through a series of case studies.