Techniques for automatically correcting words in text
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Automatic spelling correction in scientific and scholarly text
Communications of the ACM
A technique for computer detection and correction of spelling errors
Communications of the ACM
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem
Data Mining and Knowledge Discovery
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
A Comparison of Personal Name Matching: Techniques and Practical Issues
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Privacy-Preserving Data Linkage and Geocoding: Current Approaches and Research Directions
ICDMW '06 Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic data generation for deduplication and data linkage
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
A constraint satisfaction cryptanalysis of bloom filters in private record linkage
PETS'11 Proceedings of the 11th international conference on Privacy enhancing technologies
Duplicate detection through structure optimization
Proceedings of the 20th ACM international conference on Information and knowledge management
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
MFIBlocks: An effective blocking algorithm for entity resolution
Information Systems
A taxonomy of privacy-preserving record linkage techniques
Information Systems
An automatic blocking strategy for XML duplicate detection
ACM SIGAPP Applied Computing Review
Efficient two-party private blocking based on sorted nearest neighborhood clustering
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Flexible and extensible generation and corruption of personal data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
GeCo: an online personal data generator and corruptor
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
De-duplication of aggregation authority files
International Journal of Metadata, Semantics and Ontologies
Hi-index | 0.00 |
A large portion of data collected by many organisations today is about people, and often contains personal identifying information, such as names and addresses. Privacy and confidentiality are of great concern when such data is being shared between organisations or made publicly available. Research in (privacy-preserving) data mining and data linkage is suffering from a lack of publicly available real-world data sets that contain personal information, and therefore experimental evaluations can be difficult to conduct. In order to overcome this problem, we have developed a data generator that allows flexible creation of synthetic data containing personal information with realistic characteristics, such as frequency distributions, attribute dependencies, and error probabilities. Our generator significantly improves earlier approaches, and allows the generation of data for individuals, families and households.