Sequence Mining for Business Analytics: Building Project Taxonomies for Resource Demand Forecasting

  • Authors:
  • Ritendra Datta;Jianying Hu;Bonnie Ray

  • Affiliations:
  • Department of Computer Science and Engineering, The Pennsylvania State University, University Park, USA, e-mail: datta@cse.psu.edu;Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, USA, e-mail: jyhu@us.ibm.com;Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, USA, e-mail: jyhu@us.ibm.com and IBM China Research Lab, Beijing, China PRC, e-mail: bonnier@cn.ibm.com

  • Venue:
  • Proceedings of the 2008 conference on Applications of Data Mining in E-Business and Finance
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We develop techniques for mining labor records from a large number of historical IT consulting projects in order to discover clusters of projects exhibiting similar resource usage over the project life-cycle. The clustering results, together with domain expertise, are used to build a meaningful project taxonomy that can be linked to project resource requirements. Such a linkage is essential for project-based workforce demand forecasting, a key input for more advanced workforce management decision support. We formulate the problem as a sequence clustering problem where each sequence represents a project and each observation in the sequence represents the weekly distribution of project labor hours across job role categories. To solve the problem, we use a model-based clustering algorithm based on explicit state duration left-right hidden semi-Markov models (HsMM) capable of handling high-dimensional, sparse, and noisy Dirichlet-distributed observations and sequences of widely varying lengths. We then present an approach for using the underlying cluster models to estimate future staffing needs. The approach is applied to a set of 250 IT consulting projects and the results discussed.