Incorporating string transformations in record matching
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Incorporating cardinality constraints and synonym rules into conditional functional dependencies
Information Processing Letters
A grammar-based entity representation framework for data cleaning
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Mining document collections to facilitate accurate approximate entity matching
Proceedings of the VLDB Endowment
Reasoning about record matching rules
Proceedings of the VLDB Endowment
Learning string transformations from examples
Proceedings of the VLDB Endowment
Probabilistic string similarity joins
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
On indexing error-tolerant set containment
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Generalizing prefix filtering to improve set similarity joins
Information Systems
Efficient duplicate record detection based on similarity estimation
WAIM'10 Proceedings of the 11th international conference on Web-age information management
EIF: a framework of effective entity identification
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Foundations and Trends in Databases
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
Dynamic constraints for record matching
The VLDB Journal — The International Journal on Very Large Data Bases
Learning top-k transformation rules
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Learning semantic string transformations from examples
Proceedings of the VLDB Endowment
On the decidability and complexity of identity knowledge representation
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
String similarity measures and joins with synonyms
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Extending string similarity join to tolerant fuzzy token matching
ACM Transactions on Database Systems (TODS)
Hi-index | 0.00 |
Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.