SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Consistent query answers in inconsistent databases
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The Clio project: managing heterogeneity
ACM SIGMOD Record
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Garlic: a new flavor of federated query processing for DB2
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed Transitive Closure Computations: The Disconnection Set Approach
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports
Proceedings of the 27th International Conference on Very Large Data Bases
A survey of approaches to automatic schema matching
The VLDB Journal — The International Journal on Very Large Data Bases
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Statistical schema matching across web query interfaces
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The Piazza Peer Data Management System
IEEE Transactions on Knowledge and Data Engineering
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Industrial-strength schema matching
ACM SIGMOD Record
ConQuer: efficient management of inconsistent databases
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Data sharing in the Hyperion peer database system
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Approximate joins: concepts and techniques
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Efficient Batch Top-k Search for Dictionary-based Entity Recognition
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Data integration: the teenage years
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Hi-index | 0.01 |
Many aspects of the data integration problem have been considered in the literature: how to match schemas across different data sources, how to decide when different records refer to the same entity, how to efficiently perform the required entity resolution in a batch fashion, and so on. However, what has largely been ignored is a way to efficiently deploy these existing methods in a realistic, distributed enterprise integration environment. The straightforward use of existing methods often requires that all data be shipped to a coordinator for cleaning, which is often unacceptable. We develop a set of randomized algorithms that allow efficient application of existing entity resolution methods to the answering of aggregate queries over data that have been distributed across multiple sites. Using our methods, it is possible to efficiently generate aggregate query results that account for duplicate and inconsistent values scattered across a federated system.