Semantic interoperability in global information systems
ACM SIGMOD Record
Introduction to Modern Information Retrieval
Introduction to Modern Information Retrieval
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue: The Semantic Web: an evolution for a revolution
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
WWW '05 Proceedings of the 14th international conference on World Wide Web
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Semantic integration in text: from ambiguous names to identifiable entities
AI Magazine - Special issue on semantic integration
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
ACM SIGKDD Explorations Newsletter
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Profile-Based Object Matching for Information Integration
IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Creating probabilistic databases from information extraction models
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Bigtable: a distributed storage system for structured data
OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hi-index | 0.00 |
Many modern systems rely on rich heterogeneous data that has been integrated from a variety of different applications and sources. To successfully perform their tasks, these systems require to know which data refer to the same real-world entities, such as locations, people, or movies. My work focuses on addressing this requirement through a new approach for entity-aware query processing over heterogeneous data. Data provided for integration is processed to generate the possible entities and linkages between these entities. This information is never merged with the original data, but used during query processing to provide entity-aware results that reflect the real-world entities existing in the current data. Special emphasis is given to the effective management of uncertainty and correlations that either exist in the original data, or are generated by data matching techniques.