The breakdown of the information model in multi-database systems
ACM SIGMOD Record
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Fundamentals of database systems (2nd ed.)
Fundamentals of database systems (2nd ed.)
Probabilistic Datalog—a logic for powerful retrieval methods
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A probabilistic relational model and algebra
ACM Transactions on Database Systems (TODS)
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
CiteSeer: an automatic citation indexing system
Proceedings of the third ACM conference on Digital libraries
IEEE Transactions on Pattern Analysis and Machine Intelligence
AJAX: an extensible data cleaning tool
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Advanced grouping and aggregation for data integration
Proceedings of the tenth international conference on Information and knowledge management
Tries for Approximate String Matching
IEEE Transactions on Knowledge and Data Engineering
A Database-Supported Workbench for Information Fusion: INFUSE
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Entity Identification in Database Integration
Proceedings of the Ninth International Conference on Data Engineering
Using SQL to Build New Aggregates and Extenders for Object- Relational Systems
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
Duplicate Removal in Information System Dissemination
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Reducing Inconsistency in Integrating Data From Different Sources
IDEAS '01 Proceedings of the International Database Engineering & Applications Symposium
Identifying and Merging Related Bibliographic Records
Identifying and Merging Related Bibliographic Records
Approximate matching of textual domain attributes for information source integration
Proceedings of the 2nd international workshop on Information quality in information systems
Journal of Biomedical Informatics
Proceedings of the 2007 ACM symposium on Document engineering
Semantic text similarity using corpus-based word similarity and string similarity
ACM Transactions on Knowledge Discovery from Data (TKDD)
ACM Computing Surveys (CSUR)
Exploiting similarity-aware grouping in decision support systems
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Performance evaluation of similarity join for real time information integration
Proceedings of the 2nd Bangalore Annual Compute Conference
Hermes: Data Web search on a pay-as-you-go integration infrastructure
Web Semantics: Science, Services and Agents on the World Wide Web
Subsumption and complementation as data fusion operators
Proceedings of the 13th International Conference on Extending Database Technology
Prefix tree indexing for similarity search and similarity joins on genomic data
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Effective early termination techniques for text similarity join operator
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Estimating recall and precision for vague queries in databases
CAiSE'05 Proceedings of the 17th international conference on Advanced Information Systems Engineering
A user-centric framework for accessing biological sources and tools
DILS'05 Proceedings of the Second international conference on Data Integration in the Life Sciences
Declarative data fusion – syntax, semantics, and implementation
ADBIS'05 Proceedings of the 9th East European conference on Advances in Databases and Information Systems
Similarity queries: their conceptual evaluation, transformations, and processing
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
Dealing with discrepancies in data is still a big challenge in data integration systems. The problem occurs both during eliminating duplicates from semantic overlapping sources as well as during combining complementary data from different sources. Though using SQL operations like grouping and join seems to be a viable way, they fail if the attribute values of the potential duplicates or related tuples are not equal but only similar by certain criteria. As a solution to this problem, we present in this paper similarity-based variants of grouping and join operators. The extended grouping operator produces groups of similar tuples, the extended join combines tuples satisfying a given similarity condition. We describe the semantics of this operator, discuss efficient implementations for the edit distance similarity and present evaluation results. Finally, we give examples of application from the context of a data reconciliation project for looted art.