Mining incomplete survey data through classification

  • Authors:
  • Hai Wang;Shouhong Wang

  • Affiliations:
  • Saint Mary’s University, Sobey School of Business, B3H 2W3, Halifax, NS, Canada;University of Massachusetts Dartmouth, Charlton College of Business, 02747-2300, Dartmouth, MA, USA

  • Venue:
  • Knowledge and Information Systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining with incomplete survey data is an immature subject area. Mining a database with incomplete data, the patterns of missing data as well as the potential implication of these missing data constitute valuable knowledge. This paper presents the conceptual foundations of data mining with incomplete data through classification which is relevant to a specific decision making problem. The proposed technique generally supposes that incomplete data and complete data may come from different sub-populations. The major objective of the proposed technique is to detect the interesting patterns of data missing behavior that are relevant to a specific decision making, instead of estimation of individual missing value. Using this technique, a set of complete data is used to acquire a near-optimal classifier. This classifier provides the prediction reference information for analyzing the incomplete data. The data missing behavior concealed in the missing data is then revealed. Using a real-world survey data set, the paper demonstrates the usefulness of this technique.