Transformation-based Framework for Record Matching

  • Authors:
  • Arvind Arasu;Surajit Chaudhuri;Raghav Kaushik

  • Affiliations:
  • Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA. arvinda@microsoft.com;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA. surajitc@microsoft.com;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA. skaushi@microsoft.com

  • Venue:
  • ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Today's record matching infrastructure does not allow a flexible way to account for synonyms such as "Robert" and "Bob" which refer to the same name, and more general forms of string transformations such as abbreviations. We propose a programmatic framework of record matching that takes such user-defined string transformations as input. To the best of our knowledge, this is the first proposal for such a framework. This transformational framework, while expressive, poses significant computational challenges which we address. We empirically evaluate our techniques over real data.