RUBIX: a framework for improving data integration with linked data

Authors:
Ahmad Assaf;Eldad Louw;Aline Senart;Corentin Follenfant;Raphaël Troncy;David Trastour
Affiliations:
SAP Research, Mougins Cedex, France;SAP Research, Mougins Cedex, France;SAP Research, Mougins Cedex, France;SAP Research, Mougins Cedex, France;EURECOM, Sophia Antipolis, France;SAP Research, Mougins Cedex, France
Venue:
Proceedings of the First International Workshop on Open Data
Year:
2012

Citing 13
Cited 0

Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design

Database Design for Mere Mortals: A Hands-on Guide to Relational Database Design
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Automatic web news extraction using tree edit distance

Proceedings of the 13th international conference on World Wide Web
An introduction to ROC analysis

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Statistics in a nutshell

Statistics in a nutshell
Labeling data extracted from the web

OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Helix: online enterprise data analytics

Proceedings of the 20th international conference companion on World wide web
AMC - A framework for modelling and comparing matching systems as matching processes

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
oMAP: combining classifiers for aligning automatically OWL ontologies

WISE'05 Proceedings of the 6th international conference on Web Information Systems Engineering
A Self-Configuring Schema Matching System

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

With today's public data sets containing billions of data items, more and more companies are looking to integrate external data with their traditional enterprise data to improve business intelligence analysis. These distributed data sources however exhibit heterogeneous data formats and terminologies and may contain noisy data. In this paper, we present RUBIX, a novel framework that enables business users to semi-automatically perform data integration on potentially noisy tabular data. This framework offers an extension to Google Refine with novel schema matching algorithms leveraging Freebase rich types. First experiments show that using Linked Data to map cell values with instances and column headers with types improves significantly the quality of the matching results and therefore should lead to more informed decisions.