Multi-column substring matching for database schema translation

  • Authors:
  • Robert H. Warren;Frank Wm. Tompa

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada;David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe a method for discovering complex schema translations involving substrings from multiple database columns. The method does not require a training set of instances linked across databases and it is capable of dealing with both fixed-and variable-length field columns. We propose an iterative algorithm that deduces the correct sequence of concatenations of column substrings in order to translate from one database to another. We introduce the algorithm along with examples on common database data values and examine its performance on real-world and synthetic datasets.