A Bayesian decision model for cost optimal record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Iterative record linkage for cleaning and integration
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Privacy-preserving data integration and sharing
Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Methods for evaluating and creating data quality
Information Systems - Special issue: Data quality in cooperative information systems
Schema Matching Using Duplicates
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Provider issues in quality-constrained data provisioning
Proceedings of the 2nd international workshop on Information quality in information systems
Quality views: capturing and exploiting the user perspective on data quality
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Data quality awareness: a case study for cost optimal association rule mining
Knowledge and Information Systems - Special Issue on Mining Low-Quality Data
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
Relation extraction and the influence of automatic named-entity recognition
ACM Transactions on Speech and Language Processing (TSLP)
Randomized algorithms for data reconciliation in wide area aggregate query processing
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Time-completeness trade-offs in record linkage using adaptive query processing
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Learning blocking schemes for record linkage
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Creating relational data from unstructured and ungrammatical data sources
Journal of Artificial Intelligence Research
Formal anonymity models for efficient privacy-preserving joins
Data & Knowledge Engineering
Robust record linkage blocking using suffix arrays
Proceedings of the 18th ACM conference on Information and knowledge management
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Fuzzy sets in the fight against digital obesity
Fuzzy Sets and Systems
Private record matching using differential privacy
Proceedings of the 13th International Conference on Extending Database Technology
Automatic training example selection for scalable unsupervised record linkage
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Evaluation of entity resolution approaches on real-world match problems
Proceedings of the VLDB Endowment
Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Entity Resolution and Information Quality
Entity Resolution and Information Quality
Foundations and Trends in Databases
Automatically generating data linkages using a domain-independent candidate selection approach
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
Managing information quality in e-science using semantic web technology
ESWC'06 Proceedings of the 3rd European conference on The Semantic Web: research and applications
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Decision models for record linkage
Data Mining
First-Order patterns for information integration
ICWE'05 Proceedings of the 5th international conference on Web Engineering
Probabilistic iterative duplicate detection
OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Unsupervised duplicate detection using sample non-duplicates
Journal on Data Semantics VII
Improving access to multimedia using multi-source hierarchical meta-data
AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Cross-lingual knowledge linking across wiki knowledge bases
Proceedings of the 21st international conference on World Wide Web
Database enrichment environment to identify duplicate tuples
FDIA'11 Proceedings of the Fourth BCS-IRSG conference on Future Directions in Information Access
Fake injection strategies for private phonetic matching
DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Reference table based k-anonymous private blocking
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Leveraging unlabeled data to scale blocking for record linkage
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Learning expressive linkage rules using genetic programming
Proceedings of the VLDB Endowment
Detecting duplicate records in scientific workflow results
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
Link discovery with guaranteed reduction ratio in affine spaces with minkowski measures
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Integrating feature analysis and background knowledge to recommend similarity functions
WISE'12 Proceedings of the 13th international conference on Web Information Systems Engineering
The data analytics group at the qatar computing research institute
ACM SIGMOD Record
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
A taxonomy of privacy-preserving record linkage techniques
Information Systems
Active learning of expressive linkage rules using genetic programming
Web Semantics: Science, Services and Agents on the World Wide Web
Hi-index | 0.00 |
Data cleaning is a vital process that ensures the quality of data stored in real-world databases. Data cleaning prob-lems are frequently encountered in many research areas, such as knowledge discovery in databases, data ware-housing, system integration and e-services. The process of identifying the record pairs that represent the same entity (duplicate records), commonly known as record linkage, is one of the essential elements of data cleaning. In this paper, we address the record linkage problem by adopt-ing a machine learning approach. Three models are pro-posed and are analyzed empirically. Since no existing model, including those proposed in this paper, has been proved to be superior, we have developed an interactive Record Linkage Toolbox named TAILOR. Users of TAI-LOR can build their own record linkage models by tuning system parameters and by plugging in in-house developed and public domain tools. The proposed toolbox serves as a framework for the record linkage process, and is de-signed in an extensible way to interface with existing and future record linkage models. We have conducted an ex-tensive experimental study to evaluate our proposed mod-els using not only synthetic but also real data. Results show that the proposed machine learning record linkage models outperform the existing ones both in accuracy and in performance.