A study on the importance of and time spent on different modeling steps

  • Authors:
  • M. Arthur Munson

  • Affiliations:
  • Sandia National Laboratories, Livermore, CA

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Applying data mining and machine learning algorithms requires many steps to prepare data and to make use of modeling results. This study investigates two questions: (1) how time consuming are the pre- and post-processing steps? (2) how much research energy is spent on these steps? To answer these questions I surveyed practitioners about their experiences in applying modeling techniques and categorized data mining and machine learning research papers from 2009 according to the modeling step(s) they addressed. Survey results show that model building consumes only 14% of the time spent on a typical project; the remaining time is spent on pre- and post-processing steps. Both survey responses and the categorization of research papers show that data mining and machine learning researchers spend the majority of their energy on algorithms for constructing models and significantly less energy on other steps. These findings collectively suggest that there are research opportunities to simplify the steps that precede and follow model building.