The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Learning to match and cluster large high-dimensional data sets for data integration
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
A Fast Linkage Detection Scheme for Multi-Source Information Integration
WIRI '05 Proceedings of the International Workshop on Challenges in Web Information Retrieval and Integration
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Towards automated record linkage
AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
A two-step classification approach to unsupervised record linkage
AusDM '07 Proceedings of the sixth Australasian conference on Data mining and analytics - Volume 70
Febrl: a freely available record linkage system with a graphical user interface
HDKM '08 Proceedings of the second Australasian workshop on Health data and knowledge management - Volume 80
Automatic record linkage using seeded nearest neighbour and support vector machine classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic training example selection for scalable unsupervised record linkage
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Accurate Synthetic Generation of Realistic Personal Information
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Geocode Matching and Privacy Preservation
Privacy, Security, and Trust in KDD
Robust record linkage blocking using suffix arrays
Proceedings of the 18th ACM conference on Information and knowledge management
ACM SIGKDD Explorations Newsletter
HARRA: fast iterative hashed record linkage for large-scale data collections
Proceedings of the 13th International Conference on Extending Database Technology
Robust Record Linkage Blocking Using Suffix Arrays and Bloom Filters
ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-pass sorted neighborhood blocking with MapReduce
Computer Science - Research and Development
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Fake injection strategies for private phonetic matching
DPM'11 Proceedings of the 6th international conference, and 4th international conference on Data Privacy Management and Autonomous Spontaneus Security
Reference table based k-anonymous private blocking
Proceedings of the 27th Annual ACM Symposium on Applied Computing
EAGLE: efficient active learning of link specifications using genetic programming
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Detecting duplicate records in scientific workflow results
IPAW'12 Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes
An evolutionary approach to complex schema matching
Information Systems
Domain-Independent Entity Coreference for Linking Ontology Instances
Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Towards scalable real-time entity resolution using a similarity-aware inverted index approach
AusDM '08 Proceedings of the 7th Australasian Data Mining Conference - Volume 87
A taxonomy of privacy-preserving record linkage techniques
Information Systems
A supervised learning and group linking method for historical census household linkage
AusDM '11 Proceedings of the Ninth Australasian Data Mining Conference - Volume 121
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
Matching records that refer to the same entity across data-bases is becoming an increasingly important part of many data mining projects, as often data from multiple sources needs to be matched in order to enrich data or improve its quality. Significant advances in record linkage techniques have been made in recent years. However, many new techniques are either implemented in research proof-of-concept systems only, or they are hidden within expensive 'black box' commercial software. This makes it difficult for both researchers and practitioners to experiment with new record linkage techniques, and to compare existing techniques with new ones. The Febrl (Freely Extensible Biomedical Record Linkage) system aims to fill this gap. It contains many recently developed techniques for data cleaning, deduplication and record linkage, and encapsulates them into a graphical user interface (GUI). Febrl thus allows even inexperienced users to learn and experiment with both traditional and new record linkage techniques. Because Febrl is written in Python and its source code is available, it is fairly easy to integrate new record linkage techniques into it. Therefore, Febrl can be seen as a tool that allows researchers to compare various existing record linkage techniques with their own ones, enabling the record linkage research community to conduct their work more efficiently. Additionally, Febrl is suitable as a training tool for new record linkage users, and it can also be used for practical linkage projects with data sets that contain up to several hundred thousand records.