A Theory of Attributed Equivalence in Databases with Application to Schema Integration
IEEE Transactions on Software Engineering
Semint: a system prototype for semantic integration in heterogeneous databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Data integration: a theoretical perspective
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
IEEE Transactions on Knowledge and Data Engineering
Entity Identification in Database Integration
Proceedings of the Ninth International Conference on Data Engineering
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Answering queries using views: A survey
The VLDB Journal — The International Journal on Very Large Data Bases
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Keyword Searching and Browsing in Databases using BANKS
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Optimizing Recursive Information Gathering Plans in EMERAC
Journal of Intelligent Information Systems
CORDS: automatic discovery of correlations and soft functional dependencies
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Answering Imprecise Queries over Autonomous Web Databases
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Discover: keyword search in relational databases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Objectrank: authority-based keyword search in databases
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query processing over incomplete autonomous databases
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SmartInt: using mined attribute dependencies to integrate fragmented web databases
Proceedings of the 20th international conference companion on World wide web
Hi-index | 0.00 |
Many web databases can be seen as providing partial and overlapping information about entities in the world. To answer queries effectively, we need to integrate the information about the individual entities that are fragmented over multiple sources. At first blush this is just the inverse of traditional database normalization problem--rather than go from a universal relation to normalized tables, we want to reconstruct the universal relation given the tables (sources). The standard way of reconstructing the entities will involve joining the tables. Unfortunately, because of the autonomous and decentralized way in which the sources are populated, they often do not have Primary Key---Foreign Key relations. While tables may share attributes, naive joins over these shared attributes can result in reconstruction of many spurious entities thus seriously compromising precision. Our system, SmartInt is aimed at addressing the problem of data integration in such scenarios. Given a query, our system uses the Approximate Functional Dependencies (AFDs) to piece together a tree of relevant tables to answer it. The result tuples produced by our system are able to strike a favorable balance between precision and recall.