An efficient approach to identify n-wMVD for eliminating data redundancy

  • Authors:
  • Sangeeta Viswanadham;Vatsavayi Valli Kumari

  • Affiliations:
  • Andhra University, Visakhapatnam, India;Andhra University, Visakhapatnam, India

  • Venue:
  • Proceedings of the CUBE International Information Technology Conference
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data Cleaning is a process for determining whether two or more records defined differently in database, represent the same real world object. Data Cleaning is a vital function in data warehouse preprocessing. It is found that the problem of duplication/redundancy is encountered frequently when large amounts of data collected from different sources is put in the warehouse. Eliminating redundancy in the data warehouse resolves conflicts in making wrong decisions. Data cleaning is also used to solve problem of "wastage of storage space". One way of eliminating redundancy is by retrieving similar records using tokens formed on prominent attributes. Another approach is to use Conditional Functional Dependencies (CFD's) to capture the consistency of data by combining semantically related data. Existing work on data cleaning do not deal with the case of multi-valued attributes. This paper deals with nesting based weak multi-valued dependencies (n-wMVD) which can handle multi-valued attributes and redundancy removal. Our contributions are of three fold (i) An approach to convert the given database to wMVD (ii) Implementation of one-nesting on wMVD in producing n-wMVD (iii) Improvement of n-wMVD by implementation of two-nesting to eliminate redundancy. The applicability of our approach was tested. The results are encouraging and are presented in the paper.