Data mapper: an operator for expressing one-to-many data transformations

  • Authors:
  • Paulo Carreira;Helena Galhardas;João Pereira;Antónia Lopes

  • Affiliations:
  • Faculty of Sciences of the University of Lisbon, Lisboa, Portugal;INESC-ID and Instituto Superior Técnico, Porto Salvo, Portugal;INESC-ID and Instituto Superior Técnico, Porto Salvo, Portugal;Faculty of Sciences of the University of Lisbon, Lisboa, Portugal

  • Venue:
  • DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Transforming data is a fundamental operation in application scenarios involving data integration, legacy data migration, data cleaning, and extract-transform-load processes. Data transformations are often implemented as relational queries that aim at leveraging the optimization capabilities of most RDBMSs. However, relational query languages like SQL are not expressive enough to specify an important class of data transformations that produce several output tuples for a single input tuple. This class of data transformations is required for solving the data heterogeneities that occur when source data represents an aggregation of target data. In this paper, we propose and formally define the data mapper operator as an extension of the relational algebra to address one-to-many data transformations. We supply an algebraic rewriting technique that enables the optimization of data transformation expressions that combine filters expressed as standard relational operators with mappers. Furthermore, we identify the two main factors that influence the expected optimization gains.