A segmentation-based approach for temporal analysis of software version repositories

  • Authors:
  • Harvey Siy;Parvathi Chundi;Daniel J. Rosenkrantz;Mahadevan Subramaniam

  • Affiliations:
  • Computer Science Department, University of Nebraska at Omaha, Omaha, NE 68182, U.S.A.;Computer Science Department, University of Nebraska at Omaha, Omaha, NE 68182, U.S.A.;Computer Science Department, University at Albany, SUNY, Albany, NY 12222, U.S.A.;Computer Science Department, University of Nebraska at Omaha, Omaha, NE 68182, U.S.A.

  • Venue:
  • Journal of Software Maintenance and Evolution: Research and Practice
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Time series segmentation is a promising approach to discover temporal patterns from time-stamped numeric data. A novel approach to apply time series segmentation to discern temporal information from software version repositories is proposed. Data from such repositories, both numeric and non-numeric, are represented as item-set time series data. A dynamic programming algorithm for optimal segmentation is presented. The algorithm automatically produces a compacted item-set time series that can be analyzed to identify temporal patterns. The effectiveness of the approach is illustrated by analyzing version control repositories of several open-source projects to identify time-varying patterns of developer activity. The experimental results show that the segmentation algorithm produces segments that capture meaningful information and is superior to the information content obtained by arbitrarily segmenting software history into regular time intervals. Copyright © 2008 John Wiley & Sons, Ltd. A preliminary version [1] of this paper appears in the proceedings of the 2007 International Conference On Software Maintenance (ICSM '07), Paris, France.