Analysis of breast feeding data using data mining methods

  • Authors:
  • Hongxing He;Huidong Jin;Jie Chen;Damien McAullay;Jiuyong Li;Tony Fallon

  • Affiliations:
  • CSIRO Mathematical and Information Sciences, Canberra ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra ACT, Australia and National ICT Australia (NICTA), Canberra Lab, Canberra, Australia;CSIRO Mathematical and Information Sciences, Canberra ACT, Australia;CSIRO Mathematical and Information Sciences, Canberra ACT, Australia;University of Southern Queensland, Toowoomba QLD, Australia;University of Southern Queensland, Toowoomba QLD, Australia

  • Venue:
  • AusDM '06 Proceedings of the fifth Australasian conference on Data mining and analystics - Volume 61
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a population. Statistical analysis is usually carried out on survey data to test hypothesis. We report in this paper an application of data mining methodologies to breast feeding survey data which have been conducted and analysed by statisticians. The purpose of the research is to study the factors leading to deciding whether or not to breast feed a new born baby. Various data mining methods are applied to the data. Feature or variable selection is conducted to select the most discriminative and least redundant features using an information theory based method and a statistical approach. Decision tree and regression approaches are tested on classification tasks using features selected. Risk pattern mining method is also applied to identify groups with high risk of not breast feeding. The success of data mining in this study suggests that using data mining approaches will be applicable to other similar survey data. The data mining methods, which enable a search for hypotheses, may be used as a complementary survey data analysis tool to traditional statistical analysis.