A novel integrated classifier for handling data warehouse anomalies

  • Authors:
  • Peter Darcy;Bela Stantic;Abdul Sattar

  • Affiliations:
  • Institute for Integrated and Intelligent Information Systems, Griffith University;Institute for Integrated and Intelligent Information Systems, Griffith University;Institute for Integrated and Intelligent Information Systems, Griffith University

  • Venue:
  • ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Within databases employed in various commercial sectors, anomalies continue to persist and hinder the overall integrity of data. Typically, Duplicate, Wrong and Missed observations of spatial-temporal data causes the user to be not able to accurately utilise recorded information. In literature, different methods have been mentioned to clean data which fall into the category of either deterministic and probabilistic approaches. However, we believe that to ensure the maximum integrity, a data cleaning methodology must have properties of both of these categories to effectively eliminate the anomalies. To realise this, we have proposed a method which relies both on integrated deterministic and probabilistic classifiers using fusion techniques. We have empirically evaluated the proposed concept with state-of-the-art techniques and found that our approach improves the integrity of the resulting data set.