Missing Values: Proposition of a Typology and Characterization with an Association Rule-Based Model

  • Authors:
  • Leila Ben Othman;François Rioult;Sadok Ben Yahia;Bruno Crémilleux

  • Affiliations:
  • Department of Computer Science, Faculty of Sciences of Tunis, Tunisia and GREYC - CNRS UMR, University of Caen Basse-Normandie, France 6072;GREYC - CNRS UMR, University of Caen Basse-Normandie, France 6072;Department of Computer Science, Faculty of Sciences of Tunis, Tunisia;GREYC - CNRS UMR, University of Caen Basse-Normandie, France 6072

  • Venue:
  • DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Handling missing values when tackling real-world datasets is a great challenge arousing the interest of many scientific communities. Many works propose completion methods or implement new data mining techniques tolerating the presence of missing values. It turns out that these tasks are very hard. In this paper, we propose a new typology characterizing missing values according to relationships within the data. These relationships are automatically discovered by data mining techniques using generic bases of association rules. We define four types of missing values from these relationships. The characterization is made for each missing value. It differs from the well-known statistical methods which apply a same treatment for all missing values coming from a same attribute. We claim that such a local characterization enables us perceptive techniques to deal with missing values according to their origins: the way in which we deal with the missing values should depend on their origins (e.g., attribute meaningless w.r.t. other attributes, missing values depending on other data, missing values by accident). Experiments on a real-world medical dataset highlight the interests of such a characterization.