Foundations of statistical natural language processing
Foundations of statistical natural language processing
Modern Information Retrieval
Software Metrics: A Rigorous and Practical Approach
Software Metrics: A Rigorous and Practical Approach
A formal derivation of Heaps' Law
Information Sciences—Informatics and Computer Science: An International Journal
Editorial: expansion of the field of informetrics: Origins and consequences
Information Processing and Management: an International Journal - Special issue: Infometrics
Editorial: expansion of the field of informetrics: The second special issue
Information Processing and Management: an International Journal - Special issue: Informetrics
Understanding the shape of Java software
Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented programming systems, languages, and applications
IEEE Transactions on Software Engineering
A Study of the Physical Structure of Algorithms
IEEE Transactions on Software Engineering
Power-Laws in a Large Object-Oriented Software System
IEEE Transactions on Software Engineering
An Empirical Study of Class Sizes for Large Java Systems
APSEC '07 Proceedings of the 14th Asia-Pacific Software Engineering Conference
The scale-free nature of semantic web ontology
Proceedings of the 17th international conference on World Wide Web
Workshop on emerging trends in software metrics (WETSoM 2011)
Proceedings of the 33rd International Conference on Software Engineering
Word familiarity distributions to understand heaps' law of vocabulary growth of the internet forums
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part III
Hi-index | 0.01 |
The power-law regularities have been discovered behind many complex natural and social phenomenons. We discover that the power-law regularities, especially the Zipf's and Heaps' laws, also exist in large-scale software systems. We find that the distribution of lexical tokens in modern Java, C++ and C programs follows Zipf-Mandelbrot law, and the growth of program vocabulary follows Heaps' law. The results are obtained through empirical analysis of real-world software systems. We believe our discovery reveals the statistical regularities behind computer programming.