Data cleaning and transformation using the AJAX framework

  • Authors:
  • Helena Galhardas

  • Affiliations:
  • INESC-ID and Instituto Superior Técnico, Porto Salvo, Portugal

  • Venue:
  • GTTSE'05 Proceedings of the 2005 international conference on Generative and Transformational Techniques in Software Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data quality problems arise in different application contexts and require appropriate handling so that information becomes reliable. Examples of data anomalies are: missing values, the existence of duplicates, misspellings, data inconsistencies and wrong data formats. Current technologies handle data quality problems through: (i) software programs written in a programming language (e.g., C or Java) or an RDBMS programming language, (ii) the integrity constraints mechanisms offered by relational database management systems; or (iii) using a commercial data quality tool. None of these approaches is appropriate when handling non-conventional data applications dealing with large amounts of information. In fact, the existing technology is not able to support the design of a data flow graph that effectively and efficiently produce clean data. AJAX is a data cleaning and transformation tool that overcomes these aspects. In this paper, we present an overview of the entire set of functionalities supported by the AJAX system. First, we explain the logical and physical levels of the AJAX framework, and the advantages brought in terms of specification and optimization of data cleaning programs. Second, the set of logical data cleaning and transformation operators is described and exemplified, using the declarative language proposed. Third, we illustrate the purpose of the debugging facility and how it is supported by the exception mechanism offered by logical operators. Finally, the architecture of the AJAX system is presented and experimental validation of the prototype is briefly referred.