Data quality and data cleaning: an overview
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Approximate joins: concepts and techniques
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Benchmarking declarative approximate selection predicates
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Leveraging aggregate constraints for deduplication
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Example-driven design of efficient record matching queries
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Building structured web community portals: a top-down, compositional, and incremental approach
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Randomized algorithms for data reconciliation in wide area aggregate query processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Replica identification using genetic programming
Proceedings of the 2008 ACM symposium on Applied computing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Incorporating string transformations in record matching
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Improving the accuracy of entity identification through refinement
Ph.D. '08 Proceedings of the 2008 EDBT Ph.D. workshop
The impact of parameter setup on a genetic programming approach to record deduplication
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
The Harmony Integration Workbench
Journal on Data Semantics XI
Exploiting web search to generate synonyms for entities
Proceedings of the 18th international conference on World wide web
idMesh: graph-based disambiguation of linked data
Proceedings of the 18th international conference on World wide web
Author name disambiguation in MEDLINE
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient top-k algorithms for fuzzy search in string collections
Proceedings of the First International Workshop on Keyword Search on Structured Data
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incremental maintenance of length normalized indexes for approximate string matching
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Extending autocompletion to tolerate errors
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A cluster-based approach to XML similarity joins
IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Comparative evaluation of entity resolution approaches with FEVER
Proceedings of the VLDB Endowment
Efficient approximate search on string collections
Proceedings of the VLDB Endowment
Mining document collections to facilitate accurate approximate entity matching
Proceedings of the VLDB Endowment
Learning string transformations from examples
Proceedings of the VLDB Endowment
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Similarity joins of text with incomplete information formats
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Approximate membership localization (AML) for web-based join
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Duplicate identification in deep web data integration
WAIM'10 Proceedings of the 11th international conference on Web-age information management
EIF: a framework of effective entity identification
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Exploiting content redundancy for web information extraction
Proceedings of the VLDB Endowment
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
SOLOMON: seeking the truth via copying detection
Proceedings of the VLDB Endowment
Efficient entity resolution for large heterogeneous information spaces
Proceedings of the fourth ACM international conference on Web search and data mining
SourceRank: relevance and trust assessment for deep web sources based on inter-source agreement
Proceedings of the 20th international conference on World wide web
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Detecting and exploiting stability in evolving heterogeneous information spaces
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
Context-based entity description rule for entity resolution
Proceedings of the 20th ACM international conference on Information and knowledge management
Beauty and the beast: the theory and practice of information integration
ICDT'07 Proceedings of the 11th international conference on Database Theory
Cross-lingual knowledge linking across wiki knowledge bases
Proceedings of the 21st international conference on World Wide Web
Learning semantic string transformations from examples
Proceedings of the VLDB Endowment
Linking records in dynamic world
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Aggregate queries on probabilistic record linkages
Proceedings of the 15th International Conference on Extending Database Technology
Efficient and Practical Approach for Private Record Linkage
Journal of Data and Information Quality (JDIQ)
Entity resolution: theory, practice & open challenges
Proceedings of the VLDB Endowment
An automatic blocking mechanism for large-scale de-duplication tasks
Proceedings of the 21st ACM international conference on Information and knowledge management
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Approximate string matching by position restricted alignment
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Assessing relevance and trust of the deep web sources and results based on inter-source agreement
ACM Transactions on the Web (TWEB)
Graph-based reference table construction to facilitate entity matching
Journal of Systems and Software
Tuning large scale deduplication with reduced effort
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Hi-index | 0.00 |
This tutorial provides a comprehensive and cohesive overview of the key research results in the area of record linkage methodologies and algorithms for identifying approximate duplicate records, and available tools for this purpose. It encompasses techniques introduced in several communities including databases, information retrieval, statistics and machine learning. It aims to identify similarities and differences across the techniques as well as their merits and limitations.