The breakdown of the information model in multi-database systems
ACM SIGMOD Record
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Fundamentals of database systems (2nd ed.)
Fundamentals of database systems (2nd ed.)
Probabilistic Datalog—a logic for powerful retrieval methods
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A probabilistic relational model and algebra
ACM Transactions on Database Systems (TODS)
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
AJAX: an extensible data cleaning tool
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Tries for Approximate String Matching
IEEE Transactions on Knowledge and Data Engineering
A Database-Supported Workbench for Information Fusion: INFUSE
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Entity Identification in Database Integration
Proceedings of the Ninth International Conference on Data Engineering
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Duplicate Removal in Information System Dissemination
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Reducing Inconsistency in Integrating Data From Different Sources
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
ACM Computing Surveys (CSUR)
How Dirty Is Your Relational Database? An Axiomatic Approach
ECSQARU '07 Proceedings of the 9th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Estimating recall and precision for vague queries in databases
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
Policy-based inconsistency management in relational databases
International Journal of Approximate Reasoning
Hi-index | 0.00 |
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of these operators, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples how the operators can be used in given application scenarios.