Discovering power laws in computer programs

  • Authors:
  • Hongyu Zhang

  • Affiliations:
  • School of Software, Tsinghua University, Beijing 100084, China

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

The power-law regularities have been discovered behind many complex natural and social phenomenons. We discover that the power-law regularities, especially the Zipf's and Heaps' laws, also exist in large-scale software systems. We find that the distribution of lexical tokens in modern Java, C++ and C programs follows Zipf-Mandelbrot law, and the growth of program vocabulary follows Heaps' law. The results are obtained through empirical analysis of real-world software systems. We believe our discovery reveals the statistical regularities behind computer programming.