Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design
Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design
Data extraction and label assignment for web databases
WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic web news extraction using tree edit distance
Proceedings of the 13th international conference on World Wide Web
An introduction to ROC analysis
Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Statistics in a nutshell
Labeling data extracted from the web
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
DBpedia: a nucleus for a web of open data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Helix: online enterprise data analytics
Proceedings of the 20th international conference companion on World wide web
AMC - A framework for modelling and comparing matching systems as matching processes
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
oMAP: combining classifiers for aligning automatically OWL ontologies
WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
A Self-Configuring Schema Matching System
ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Hi-index | 0.00 |
With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may contain noisy data. In this paper, we present RUBIX, a novel framework that enables business users to semi-automatically perform data integration on potentially noisy tabular data. This framework offers an extension to Google Refine with novel schema matching algorithms leveraging Freebase rich types. First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more informed decisions.