SMARTINT: using mined attribute dependencies to integrate fragmented web databases

Authors:
Ravi Gummadi;Anupam Khulbe;Aravind Kalavagattu;Sanil Salvi;Subbarao Kambhampati
Affiliations:
Department of Computer Science & Engineering, Arizona State University, Tempe, USA 85287;Department of Computer Science & Engineering, Arizona State University, Tempe, USA 85287;Department of Computer Science & Engineering, Arizona State University, Tempe, USA 85287;Department of Computer Science & Engineering, Arizona State University, Tempe, USA 85287;Department of Computer Science & Engineering, Arizona State University, Tempe, USA 85287
Venue:
Journal of Intelligent Information Systems
Year:
2012

Citing 17
Cited 0

A Theory of Attributed Equivalence in Databases with Application to Schema Integration

IEEE Transactions on Software Engineering
Semint: a system prototype for semantic integration in heterogeneous databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Learning to Match the Schemas of Data Sources: A Multistrategy Approach

Machine Learning
Resolving Database Incompatibility: An Approach to Performing Relational Operations over Mismatched Domains

IEEE Transactions on Knowledge and Data Engineering
Entity Identification in Database Integration

Proceedings of the Ninth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Answering queries using views: A survey

The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Recursive Information Gathering Plans in EMERAC

Journal of Intelligent Information Systems
CORDS: automatic discovery of correlations and soft functional dependencies

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Answering Imprecise Queries over Autonomous Web Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Discover: keyword search in relational databases

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Objectrank: authority-based keyword search in databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query processing over incomplete autonomous databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SmartInt: using mined attribute dependencies to integrate fragmented web databases

Proceedings of the 20th international conference companion on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem--rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key---Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.