Fast parallel and serial approximate string matching
Journal of Algorithms
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Approximate string-matching with q-grams and maximal matches
Theoretical Computer Science - Selected papers of the Combinatorial Pattern Matching School
Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Data manipulation in heterogeneous databases
ACM SIGMOD Record
Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
Improving Generalization with Active Learning
Machine Learning - Special issue on structured connectionist systems
Research problems in data warehousing
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Learning to Understand Information on the Internet: AnExample-Based Approach
Journal of Intelligent Information Systems - Special issue: next generation information technologies and systems
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Making large-scale support vector machine learning practical
Advances in kernel methods
Finding replicated Web collections
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
The double metaphone search algorithm
C/C++ Users Journal
Automating the approximate record-matching process
Information Sciences—Informatics and Computer Science: An International Journal
Data integration using similarity joins and a word-based information representation language
ACM Transactions on Information Systems (TOIS)
Record linkage: making maximum use of the discriminating power of identifying information
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Automatic segmentation of text into structured records
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 11th international conference on World Wide Web
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Mining database structure; or, how to build a data quality browser
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Information Sciences: an International Journal
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
The Inter-Database Instance Identification Problem in Integrating Autonomous Systems
Proceedings of the Fifth International Conference on Data Engineering
Entity Identification in Database Integration
Proceedings of the Ninth International Conference on Data Engineering
Hierarchically Classifying Documents Using Very Few Words
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Maximum Entropy Markov Models for Information Extraction and Segmentation
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Declarative Data Cleaning: Language, Model, and Algorithms
Proceedings of the 27th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
On Using q-Gram Locations in Approximate String Matching
ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
Interactive deduplication using active learning
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning domain-independent string transformation weights for high accuracy object identification
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Text joins in an RDBMS for web data integration
WWW '03 Proceedings of the 12th international conference on World Wide Web
A Bayesian decision model for cost optimal record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Entity Matching in Heterogeneous Databases: A Distance Based Decision Model
HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 7 - Volume 7
Efficient processing of joins on set-valued attributes
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Robust and efficient fuzzy match for online data cleaning
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TAILOR: A Record Linkage Tool Box
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient set joins on similarity predicates
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Machine Learning
A generalized cost optimal decision model for record matching
Proceedings of the 2004 international workshop on Information quality in information systems
Mining reference tables for automatic text segmentation
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '04 Proceedings of the twenty-first international conference on Machine learning
A hierarchical graphical model for record linkage
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Robust Identification of Fuzzy Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Information Extraction: Distilling Structured Data from Unstructured Text
Queue - Social Computing
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming and Delivering Data
Eliminating fuzzy duplicates in data warehouses
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Merging the results of approximate match operations
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Query relaxation using malleable schemas
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Privacy preserving schema and data matching
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Record matching in digital library metadata
Communications of the ACM - Alternate reality gaming
Brokering infrastructure for minimum cost data procurement based on quality-quantity models
Decision Support Systems
Efficient similarity joins for near duplicate detection
Proceedings of the 17th international conference on World Wide Web
Dependencies revisited for improving data quality
Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Boosting text segmentation via progressive classification
Knowledge and Information Systems
Named entity normalization in user generated content
Proceedings of the second workshop on Analytics for noisy unstructured text data
Innovation in the cluster validating techniques
Fuzzy Optimization and Decision Making
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Matching XML documents in highly dynamic applications
Proceedings of the eighth ACM symposium on Document engineering
Integration of Semantically Annotated Data by the KnoFuss Architecture
EKAW '08 Proceedings of the 16th international conference on Knowledge Engineering: Practice and Patterns
Adopting ontologies for multisource identity resolution
OBI '08 Proceedings of the first international workshop on Ontology-supported business intelligence
Learning to create data-integrating queries
Proceedings of the VLDB Endowment
Industry-scale duplicate detection
Proceedings of the VLDB Endowment
Scaling up duplicate detection in graph data
Proceedings of the 17th ACM conference on Information and knowledge management
On co-authorship for author disambiguation
Information Processing and Management: an International Journal
Automatic extraction of social networks by topics of interest
International Journal of Computer Applications in Technology
Consolidation of References to Persons in Bibliographic Databases
ICADL 08 Proceedings of the 11th International Conference on Asian Digital Libraries: Universal and Ubiquitous Access to Information
Refining Instance Coreferencing Results Using Belief Propagation
ASWC '08 Proceedings of the 3rd Asian Semantic Web Conference on The Semantic Web
The impact of parameter setup on a genetic programming approach to record deduplication
SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Active Energy-Aware Management of Business-Process Based Applications
ServiceWave '08 Proceedings of the 1st European Conference on Towards a Service-Based Internet
Time-completeness trade-offs in record linkage using adaptive query processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Generalized Mongue-Elkan Method for Approximate Text String Comparison
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Transliteration Based Text Input Methods for Telugu
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Accurate Synthetic Generation of Realistic Personal Information
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Entity resolution with iterative blocking
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Incremental maintenance of length normalized indexes for approximate string matching
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Geocode Matching and Privacy Preservation
Privacy, Security, and Trust in KDD
Toward automatic artifact matching for tool evaluation
Proceedings of the 47th Annual Southeast Regional Conference
Selecting and Improving System Call Models for Anomaly Detection
DIMVA '09 Proceedings of the 6th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Conditional Dependencies: A Principled Approach to Improving Data Quality
BNCOD 26 Proceedings of the 26th British National Conference on Databases: Dataspace: The Final Frontier
Optimal Stopping: A Record-Linkage Approach
Journal of Data and Information Quality (JDIQ)
Incorporating Domain-Specific Information Quality Constraints into Database Queries
Journal of Data and Information Quality (JDIQ)
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
A strategy for allowing meaningful and comparable scores in approximate matching
Information Systems
The Normalized Compression Distance as a Distance Measure in Entity Identification
ICDM '09 Proceedings of the 9th Industrial Conference on Advances in Data Mining. Applications and Theoretical Aspects
Generic Entity Resolution in Relational Databases
ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Finding ontological correspondences for a domain-independent natural language dialog agent
IAAI'08 Proceedings of the 20th national conference on Innovative applications of artificial intelligence - Volume 3
Creating probabilistic databases from duplicated data
The VLDB Journal — The International Journal on Very Large Data Bases
A framework for semantic link discovery over relational data
Proceedings of the 18th ACM conference on Information and knowledge management
Discovering matching dependencies
Proceedings of the 18th ACM conference on Information and knowledge management
I seek you: searching and matching individuals in social networks
Proceedings of the eleventh international workshop on Web information and data management
ACM SIGKDD Explorations Newsletter
Generic entity resolution with negative rules
The VLDB Journal — The International Journal on Very Large Data Bases
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Comparative evaluation of entity resolution approaches with FEVER
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Data fusion: resolving data conflicts for integration
Proceedings of the VLDB Endowment
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Learning string transformations from examples
Proceedings of the VLDB Endowment
Modeling and querying possible repairs in duplicate detection
Proceedings of the VLDB Endowment
A novel approach for entity linkage
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Discovering and Maintaining Links on the Web of Data
ISWC '09 Proceedings of the 8th International Semantic Web Conference
"Same, Same but Different" A Survey on Duplicate Detection Methods for Situation Awareness
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part II
A Model for Semantic Equivalence Discovery for Harmonizing Master Data
OTM '09 Proceedings of the Confederated International Workshops and Posters on On the Move to Meaningful Internet Systems: ADI, CAMS, EI2N, ISDE, IWSSA, MONET, OnToContent, ODIS, ORM, OTM Academy, SWWS, SEMELS, Beyond SAWSDL, and COMBEK 2009
Copy-Move Forgery Detection in Digital Image
PCM '09 Proceedings of the 10th Pacific Rim Conference on Multimedia: Advances in Multimedia Information Processing
Merging and Ranking Answers in the Semantic Web: The Wisdom of Crowds
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Entity-aware query processing for heterogeneous data with uncertainty and correlations
Proceedings of the 2009 EDBT/ICDT Workshops
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
Subsumption and complementation as data fusion operators
Proceedings of the 13th International Conference on Extending Database Technology
Text-to-query: dynamically building structured analytics to illustrate textual content
Proceedings of the 2010 EDBT/ICDT Workshops
Interweaving OAI-PMH data sources with the linked data cloud
International Journal of Metadata, Semantics and Ontologies
Graph-based concept identification and disambiguation for enterprise search
Proceedings of the 19th international conference on World wide web
DSNotify: handling broken links in the web of data
Proceedings of the 19th international conference on World wide web
Enabling entity-based aggregators for web 2.0 data
Proceedings of the 19th international conference on World wide web
ICADL'07 Proceedings of the 10th international conference on Asian digital libraries: looking back 10 years and forging new frontiers
ONN the use of neural networks for data privacy
SOFSEM'08 Proceedings of the 34th conference on Current trends in theory and practice of computer science
Sampling dirty data for matching attributes
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On active learning of record matching packages
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
MapDupReducer: detecting near duplicates over massive datasets
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Multiple relationship based deduplication
Proceedings of the Fourth SIGMOD PhD Workshop on Innovative Database Research
Properties of possibilistic string comparison
IEEE Transactions on Fuzzy Systems
Detecting duplicate biological entities using Shortest Path Edit Distance
International Journal of Data Mining and Bioinformatics
A comparative analysis of similarity measurement techniques through SimReq framework
Proceedings of the 7th International Conference on Frontiers of Information Technology
Proceedings of the 6th International Conference on Semantic Systems
Shortest path edit distance for detecting duplicate biological entities
Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
CONE: metrics for automatic evaluation of named entity co-reference resolution
NEWS '10 Proceedings of the 2010 Named Entities Workshop
Generating document summaries from user annotations
ESAIR '10 Proceedings of the third workshop on Exploiting semantic annotations in information retrieval
An efficient duplicate record detection using q-grams array inverted index
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Dependency discovery in data quality
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Feature-based entity matching: the FBEM model, implementation, evaluation
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
From web data to entities and back
CAiSE'10 Proceedings of the 22nd international conference on Advanced information systems engineering
Duplicate identification in deep web data integration
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Efficient duplicate record detection based on similarity estimation
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Focusing computational visual attention in multi-modal human-robot interaction
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Record linkage with uniqueness constraints and erroneous values
Proceedings of the VLDB Endowment
On-the-fly entity-aware query processing in the presence of linkage
Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment
Exploiting content redundancy for web information extraction
Proceedings of the VLDB Endowment
Explore or exploit?: effective strategies for disambiguating large databases
Proceedings of the VLDB Endowment
Entity resolution with evolving rules
Proceedings of the VLDB Endowment
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient entity resolution for large heterogeneous information spaces
Proceedings of the fourth ACM international conference on Web search and data mining
Data cleaning and query answering with matching dependencies and matching functions
Proceedings of the 14th International Conference on Database Theory
Detecting near-duplicate relations in user generated forum content
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Composition of scientific teams and publication productivity at a national science lab
Journal of the American Society for Information Science and Technology
A self-training approach for resolving object coreference on the semantic web
Proceedings of the 20th international conference on World wide web
Proceedings of the 4th International Workshop on Logic in Databases
The missing links: discovering hidden same-as links among a billion of triples
Proceedings of the 12th International Conference on Information Integration and Web-based Applications & Services
Identity matching using personal and social identity features
Information Systems Frontiers
Foundations and Trends in Databases
A privacy preserving efficient protocol for semantic similarity join using long string attributes
Proceedings of the 4th International Workshop on Privacy and Anonymity in the Information Society
Wrangler: interactive visual specification of data transformation scripts
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A fast approach for parallel deduplication on multicore processors
Proceedings of the 2011 ACM Symposium on Applied Computing
Creating knowledge out of interlinked data: making the web a data washing machine
Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Determining the currency of data
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Interaction between record matching and data repairing
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
LinkDB: a probabilistic linkage database system
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Succinct summaries of narrative events using social networks
Proceedings of the 22nd ACM conference on Hypertext and hypermedia
Event correlation for process discovery from web service interaction logs
The VLDB Journal — The International Journal on Very Large Data Bases
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Detecting and exploiting stability in evolving heterogeneous information spaces
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
To compare or not to compare: making entity resolution more efficient
Proceedings of the International Workshop on Semantic Web Information Management
Multipedia: enriching DBpedia with multimedia information
Proceedings of the sixth international conference on Knowledge capture
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Differential dependencies: Reasoning and discovery
ACM Transactions on Database Systems (TODS)
A supervised machine learning approach for duplicate detection over gazetteer records
GeoS'11 Proceedings of the 4th international conference on GeoSpatial semantics
Controlling false match rates in record linkage using extreme value theory
Journal of Biomedical Informatics
Matching unstructured product offers to structured product specifications
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Entity matching: how similar is similar
Proceedings of the VLDB Endowment
A set of experiments to consider data quality criteria in classification techniques for data mining
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part II
How unique and traceable are usernames?
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
A constraint satisfaction cryptanalysis of bloom filters in private record linkage
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
PG-join: proximity graph based string similarity joins
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Privacy preserving group linkage
SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Introduction to linked data and its lifecycle on the web
RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Ingredients for accurate, fast, and robust XML similarity joins
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Learning top-k transformation rules
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Efficient duplicate detection on cloud using a new signature scheme
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Applied Intelligence
Incorporating domain knowledge and user expertise in probabilistic Tuple merging
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Conflict-aware historical data fusion
SUM'11 Proceedings of the 5th international conference on Scalable uncertainty management
Conversational agents in a virtual world
KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
A publication process model to enable privacy-aware data sharing
IBM Journal of Research and Development
Linking semantic desktop data to the web of data
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
DC proposal: towards linked data assessment and linking temporal facts
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part II
Coreference aware web object retrieval
Proceedings of the 20th ACM international conference on Information and knowledge management
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate
Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient similarity search: arbitrary similarity measures, arbitrary composition
Proceedings of the 20th ACM international conference on Information and knowledge management
Proceedings of the 20th ACM international conference on Information and knowledge management
Instance-based 'one-to-some' assignment of similarity measures to attributes
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I
Exploiting attribute redundancy for web entity data extraction
ICADL'11 Proceedings of the 13th international conference on Asia-pacific digital libraries: for cultural heritage, knowledge dissemination, and future creation
KD2R: a key discovery method for semantic reference reconciliation
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems
Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine
Web Semantics: Science, Services and Agents on the World Wide Web
Computer-based genealogy reconstruction in founder populations
Journal of Biomedical Informatics
PARIS: probabilistic alignment of relations, instances, and schema
Proceedings of the VLDB Endowment
Improving data quality by source analysis
Journal of Data and Information Quality (JDIQ)
Quality-aware similarity assessment for entity matching in Web data
Information Systems
Beyond 100 million entities: large-scale blocking-based resolution for heterogeneous data
Proceedings of the fifth ACM international conference on Web search and data mining
Multi-pass sorted neighborhood blocking with MapReduce
Computer Science - Research and Development
DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
Web Semantics: Science, Services and Agents on the World Wide Web
Leveraging terminological structure for object reconciliation
ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part II
Theoretical foundations for enabling a web of knowledge
FoIKS'10 Proceedings of the 6th international conference on Foundations of Information and Knowledge Systems
Similarity function recommender service using incremental user knowledge acquisition
ICSOC'11 Proceedings of the 9th international conference on Service-Oriented Computing
Cross-lingual knowledge linking across wiki knowledge bases
Proceedings of the 21st international conference on World Wide Web
Towards certain fixes with editing rules and master data
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient Privacy Preserving Protocols for Similarity Join
Transactions on Data Privacy
Learning semantic string transformations from examples
Proceedings of the VLDB Endowment
Linking records in dynamic world
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
Pay-as-you-go data integration for linked data: opportunities, challenges and architectures
SWIM '12 Proceedings of the 4th International Workshop on Semantic Web Information Management
Fake injection strategies for private phonetic matching
DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Flexible and efficient distributed resolution of large entities
FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Reference table based k-anonymous private blocking
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Tailoring entity resolution for matching product offers
Proceedings of the 15th International Conference on Extending Database Technology
On generating large-scale ground truth datasets for the deduplication of bibliographic records
Proceedings of the 2nd International Conference on Web Intelligence, Mining and Semantics
Profiler: integrated statistical analysis and visualization for data quality assessment
Proceedings of the International Working Conference on Advanced Visual Interfaces
Integrating open government data with stratosphere for more transparency
Web Semantics: Science, Services and Agents on the World Wide Web
Efficient and Practical Approach for Private Record Linkage
Journal of Data and Information Quality (JDIQ)
Open business intelligence: on the importance of data quality awareness in user-friendly data mining
Proceedings of the 2012 Joint EDBT/ICDT Workshops
Leveraging matching dependencies for guided user feedback in linked data applications
Proceedings of the Ninth International Workshop on Information Integration on the Web
Information Visualization - Special issue on State of the Field and New Research Directions
Aggregating web offers to determine product prices
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Active sampling for entity matching
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Multiple instance learning for group record linkage
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Foundations and Trends in Information Retrieval
Unsupervised learning of link discovery configuration
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
EAGLE: efficient active learning of link specifications using genetic programming
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
CrowdER: crowdsourcing entity resolution
Proceedings of the VLDB Endowment
Learning expressive linkage rules using genetic programming
Proceedings of the VLDB Endowment
Sift: an end-user tool for gathering web content on the go
Proceedings of the 2012 ACM symposium on Document engineering
OtO matching system: a multi-strategy approach to instance matching
CAiSE'12 Proceedings of the 24th international conference on Advanced Information Systems Engineering
Exploiting evidence from unstructured data to enhance master data management
Proceedings of the VLDB Endowment
Journal of Biomedical Informatics
Proceedings of the 3rd Annual ACM Web Science Conference
The impact of spelling errors on patent search
EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
The intelius nickname collection: quantitative analyses from billions of public records
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Determining the Currency of Data
ACM Transactions on Database Systems (TODS)
Matching product titles using web-based enrichment
Proceedings of the 21st ACM international conference on Information and knowledge management
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Computer Methods and Programs in Biomedicine
An automatic blocking mechanism for large-scale de-duplication tasks
Proceedings of the 21st ACM international conference on Information and knowledge management
An effective rule miner for instance matching in a web of data
Proceedings of the 21st ACM international conference on Information and knowledge management
Frequent grams based embedding for privacy preserving record linkage
Proceedings of the 21st ACM international conference on Information and knowledge management
Map to humans and reduce error: crowdsourcing for deduplication applied to digital libraries
Proceedings of the 21st ACM international conference on Information and knowledge management
LINDA: distributed web-of-data-scale entity matching
Proceedings of the 21st ACM international conference on Information and knowledge management
Scaling multiple-source entity resolution using statistically efficient transfer learning
Proceedings of the 21st ACM international conference on Information and knowledge management
Fast and accurate incremental entity resolution relative to an entity knowledge base
Proceedings of the 21st ACM international conference on Information and knowledge management
Duplicate detection in pay-per-click streams using temporal stateful Bloom filters
International Journal of Data Analysis Techniques and Strategies
Study on data preprocessing for daylight climate data
ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Tractable cases of clean query answering under entity resolution via matching dependencies
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Evaluating indeterministic duplicate detection results
SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
Discovering links among social networks
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Detecting duplicate records in scientific workflow results
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Matching points of interest from different social networking sites
KI'12 Proceedings of the 35th Annual German conference on Advances in Artificial Intelligence
Automatic SLA Matching and Provider Selection in Grid and Cloud Computing Markets
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Leveraging the storage layer to support XML similarity joins in XDBMSs
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
An evolutionary approach to complex schema matching
Information Systems
Heuristic supervised approach for record linkage
MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
A machine learning approach for instance matching based on similarity metrics
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Integrating feature analysis and background knowledge to recommend similarity functions
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
Spatio-textual similarity joins
Proceedings of the VLDB Endowment
Finding email correspondents in online social networks
World Wide Web
The data analytics group at the qatar computing research institute
ACM SIGMOD Record
What's in a name?: an unsupervised approach to link users across communities
Proceedings of the sixth ACM international conference on Web search and data mining
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Adaptive Connection Strength Models for Relationship-Based Entity Resolution
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Indeterministic Handling of Uncertain Decisions in Deduplication
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Data Linking for the Semantic Web
International Journal on Semantic Web & Information Systems
Memory efficient minimum substring partitioning
Proceedings of the VLDB Endowment
Actively soliciting feedback for query answers in keyword search-based data integration
Proceedings of the VLDB Endowment
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
Efficient privacy-aware record integration
Proceedings of the 16th International Conference on Extending Database Technology
HIL: a high-level scripting language for entity integration
Proceedings of the 16th International Conference on Extending Database Technology
Comparable dependencies over heterogeneous data
The VLDB Journal — The International Journal on Very Large Data Bases
Don't be SCAREd: use SCalable Automatic REpairing with maximal likelihood and bounded changes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Knowledge harvesting in the big-data era
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
LinkIT: privacy preserving record linkage and integration via transformations
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Determining the relative accuracy of attributes
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
NADEEF: a commodity data cleaning system
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Graph-based reference table construction to facilitate entity matching
Journal of Systems and Software
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Efficient XML duplicate detection using an adaptive two-level optimization
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Discovering interesting information with advances in web technology
ACM SIGKDD Explorations Newsletter
An efficient two-party protocol for approximate matching in private record linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
A supervised learning and group linking method for historical census household linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Tuning large scale deduplication with reduced effort
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Active Sampling for Entity Matching with Guarantees
ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Automation of data normalization for implementing master data management systems
Programming and Computing Software
An automatic blocking strategy for XML duplicate detection
ACM SIGAPP Applied Computing Review
Efficient parsing-based search over structured data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Storing and analysing voice of the market data in the corporate data warehouse
Information Systems Frontiers
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
A hybrid model words-driven approach for web product duplicate detection
CAiSE'13 Proceedings of the 25th international conference on Advanced Information Systems Engineering
Weighted multi-attribute matching of user-generated points of interest
Proceedings of the 21st ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
Similarity evaluation in XML schema and XLink
Proceedings of the 19th Brazilian symposium on Multimedia and the web
Evaluation of instance matching tools: The experience of OAEI
Web Semantics: Science, Services and Agents on the World Wide Web
Introduction to linked data and its lifecycle on the web
RW'13 Proceedings of the 9th international conference on Reasoning Web: semantic technologies for intelligent data access
Editorial: Efficient discovery of similarity constraints for matching dependencies
Data & Knowledge Engineering
WOO: a scalable and multi-tenant platform for continuous knowledge base synthesis
Proceedings of the VLDB Endowment
Mining frequent patterns with differential privacy
Proceedings of the VLDB Endowment
Question selection for crowd entity resolution
Proceedings of the VLDB Endowment
Efficient querying of inconsistent databases with binary integer programming
Proceedings of the VLDB Endowment
Query-driven approach to entity resolution
Proceedings of the VLDB Endowment
Entity resolution for distributed probabilistic data
Distributed and Parallel Databases
Linkage of compound objects for supporting maintenance of large-scale web sites
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
An automatic key discovery approach for data linking
Web Semantics: Science, Services and Agents on the World Wide Web
Toward detection of aliases without string similarity
Information Sciences: an International Journal
Deduplicating a places database
Proceedings of the 23rd international conference on World wide web
Repairing broken RDF links in the web of data
International Journal of Web Engineering and Technology
Incremental entity resolution on rules and data
The VLDB Journal — The International Journal on Very Large Data Bases
Sampling from repairs of conditional functional dependency violations
The VLDB Journal — The International Journal on Very Large Data Bases
Joint entity resolution on multiple datasets
The VLDB Journal — The International Journal on Very Large Data Bases
Variable linkage for multimedia metadata schema matching
Multimedia Tools and Applications
Efficient indexing techniques for record matching and deduplication
International Journal of Computational Vision and Robotics
Journal of Information Science
Publishing bibliographic data on the Semantic Web using BibBase
Semantic Web - Linked Data for science and education
Deduplication of metadata harvested from Open Archives Initiative repositories
Information Services and Use - Mining the Digital Information Networks
Hi-index | 0.00 |
Often, in the real world, entities have two or more representations in databases. Duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. In this paper, we present a thorough analysis of the literature on duplicate record detection. We cover similarity metrics that are commonly used to detect similar field entries, and we present an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database. We also cover multiple techniques for improving the efficiency and scalability of approximate duplicate detection algorithms. We conclude with coverage of existing tools and with a brief discussion of the big open problems in the area.