Semantic interoperability in global information systems
ACM SIGMOD Record
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Exploratory Data Mining and Data Cleaning
Exploratory Data Mining and Data Cleaning
Computer Networks: The International Journal of Computer and Telecommunications Networking - Special issue: The Semantic Web: an evolution for a revolution
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Reference reconciliation in complex information spaces
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
ACM SIGKDD Explorations Newsletter
Clean Answers over Dirty Databases: A Probabilistic Approach
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Principles of dataspace systems
Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Trio: a system for data, uncertainty, and lineage
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Management of probabilistic data: foundations and challenges
Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Inferring XML schema definitions from XML data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Probabilistic Entity Linkage for Heterogeneous Information Spaces
CAiSE '08 Proceedings of the 20th international conference on Advanced Information Systems Engineering
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do
SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Modeling Concept Evolution: A Historical Perspective
ER '09 Proceedings of the 28th International Conference on Conceptual Modeling
Enabling entity-based aggregators for web 2.0 data
Proceedings of the 19th international conference on World wide web
Schema-as-you-go: on probabilistic tagging and querying of wide tables
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
LinkDB: a probabilistic linkage database system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Supporting queries spanning across phases of evolving artifacts using Steiner forests
Proceedings of the 20th ACM international conference on Information and knowledge management
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data
Proceedings of the fifth ACM international conference on Web search and data mining
Evaluating indeterministic duplicate detection results
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Indeterministic Handling of Uncertain Decisions in Deduplication
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Big data challenge: a data management perspective
Frontiers of Computer Science: Selected Publications from Chinese Universities
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Entity linkage is central to almost every data integration and data cleaning scenario. Traditional techniques use some computed similarity among data structure to perform merges and then answer queries on the merged data. We describe a novel framework for entity linkage with uncertainty. Instead of using the linkage information to merge structures a-priori, possible linkages are stored alongside the data with their belief value. A new probabilistic query answering technique is used to take the probabilistic linkage into consideration. The framework introduces a series of novelties: (i) it performs merges at run time based not only on existing linkages but also on the given query; (ii) it allows results that may contain structures not explicitly represented in the data, but generated as a result of a reasoning on the linkages; and (iii) enables an evaluation of the query conditions that spans across linked structures, offering a functionality not currently supported by any traditional probabilistic databases. We formally define the semantics, describe an efficient implementation and report on the findings of our experimental evaluation.