A probabilistic relational data model
EDBT '90 Proceedings of the 2nd international conference on extending database technology: Advances in Database Technology
The reliability of queries (extended abstract)
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
ProbView: a flexible probabilistic database system
ACM Transactions on Database Systems (TODS)
Data quality and systems theory
Communications of the ACM
The complexity of query reliability
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modern Information Retrieval
A Probabilistic XML Approach to Data Integration
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
MYSTIQ: a system for finding more answers by using probabilities
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
U-DBMS: a database system for managing constantly-evolving data
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Semantic-integration research in the database community
AI Magazine - Special issue on semantic integration
MonetDB/XQuery: a fast XQuery processor powered by a relational engine
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Merging uncertain information with semantic heterogeneity in XML
Knowledge and Information Systems
User Feedback in Probabilistic Integration
DEXA '07 Proceedings of the 18th International Conference on Database and Expert Systems Applications
EntityRank: searching entities directly and holistically
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data integration with uncertainty
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A three-valued semantics for querying and repairing inconsistent databases
Annals of Mathematics and Artificial Intelligence
Bootstrapping pay-as-you-go data integration systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Cleaning uncertain data with quality guarantees
Proceedings of the VLDB Endowment
Swoosh: a generic approach to entity resolution
The VLDB Journal — The International Journal on Very Large Data Bases
IMPrECISE: Good-is-good-enough data integration
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Modeling documents as mixtures of persons for expert finding
ECIR'08 Proceedings of the IR research, 30th European conference on Advances in information retrieval
Querying and updating probabilistic information in XML
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Project-Join-Repair: an approach to consistent query answering under functional dependencies
FQAS'06 Proceedings of the 7th international conference on Flexible Query Answering Systems
Compression of Probabilistic XML Documents
SUM '09 Proceedings of the 3rd International Conference on Scalable Uncertainty Management
ProApproX: a lightweight approximation query processor over probabilistic trees
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Incorporating domain knowledge and user expertise in probabilistic Tuple merging
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Towards a version control model with uncertain data
Proceedings of the 4th workshop on Workshop for Ph.D. students in information & knowledge management
CyLog/Crowd4U: a declarative platform for complex data-centric crowdsourcing
Proceedings of the VLDB Endowment
Indeterministic Handling of Uncertain Decisions in Deduplication
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Uncertain version control in open collaborative editing of tree-structured documents
Proceedings of the 2013 ACM symposium on Document engineering
Hi-index | 0.00 |
In data integration efforts, portal development in particular, much development time is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates or solve other semantic conflicts. It proves impossible, however, to automatically get rid of all semantic problems. An often-used rule of thumb states that about 90% of the development effort is devoted to semi-automatically resolving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that strives for a `good enough' initial integration which stores any remaining semantic uncertainty and conflicts in a probabilistic database. The remaining cases are to be resolved with user feedback during query time. The main contribution of this paper is an experimental investigation of the effects and sensitivity of rule definition, threshold tuning, and user feedback on the integration quality. We claim that our approach indeed reduces development effort--and not merely shifts the effort--by showing that setting rough safe thresholds and defining only a few rules suffices to produce a `good enough' initial integration that can be meaningfully used, and that user feedback is effective in gradually improving the integration quality.