XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries
Proceedings of the 27th International Conference on Very Large Data Bases
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
THALIA: Test Harness for the Assessment of Legacy Information Integration Approaches
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
DogmatiX tracks down duplicates in XML
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Declarative data fusion – syntax, semantics, and implementation
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
FuSem: exploring different semantics of data fusion
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ACM Computing Surveys (CSUR)
Methodologies for data quality assessment and improvement
ACM Computing Surveys (CSUR)
A framework for semantic link discovery over relational data
Proceedings of the 18th ACM conference on Information and knowledge management
Declarative XML data cleaning with XClean
CAiSE'07 Proceedings of the 19th international conference on Advanced information systems engineering
Data integration systems for scientific applications
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
BioFuice: mapping-based data integration in bioinformatics
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
A method for similarity-based grouping of biological data
DILS'06 Proceedings of the Third international conference on Data Integration in the Life Sciences
Hi-index | 0.00 |
Heterogeneous and dirty data is abundant. It is stored under different, often opaque schemata, it represents identical real-world objects multiple times, causing duplicates, and it has missing values and conflicting values. The Humboldt Merger (HumMer) is a tool that allows ad-hoc, declarative fusion of such data using a simple extension to SQL.Guided by a query against multiple tables, HumMer proceeds in three fully automated steps: First, instance-based schema matching bridges schematic heterogeneity of the tables by aligning corresponding attributes. Next, duplicate detection techniques find multiple representations of identical real-world objects. Finally, data fusion and conflict resolution merges duplicates into a single, consistent, and clean representation.