Open-Source Change Logs

Authors:
Kai Chen;Stephen R. Schach;Liguo Yu;Jeff Offutt;Gillian Z. Heller
Affiliations:
Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 kai.chen@vanderbilt.edu;Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 srs@vuse.vanderbilt.edu;Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN 37235 liguo.yu@vanderbilt.edu;Department of Information and Software Engineering, George Mason University, Fairfax, VA 22030 ofut@ise.gmu.edu;Department of Statistics, Macquarie University, Sydney, NSW 2109, Australia gheller@efs.mq.edu.au
Venue:
Empirical Software Engineering
Year:
2004

Citing 4
Cited 17

A case study of open source software development: the Apache server

Proceedings of the 22nd international conference on Software engineering
Object-Oriented and Classical Software Engineering

Object-Oriented and Classical Software Engineering
Two case studies of open source software development: Apache and Mozilla

ACM Transactions on Software Engineering and Methodology (TOSEM)
Editorial: Open Source and Empirical Software Engineering

Empirical Software Engineering

SEEWeb: making experimental artifacts available

ACM SIGSOFT Software Engineering Notes
Predicting the Probability of Change in Object-Oriented Systems

IEEE Transactions on Software Engineering
Replaying development history to assess the effectiveness of change propagation tools

Empirical Software Engineering
Which warnings should I fix first?

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Self-organization process in open-source software: An empirical study

Information and Software Technology
A survey and taxonomy of approaches for mining software repositories in the context of software evolution

Journal of Software Maintenance and Evolution: Research and Practice
Automated classification of change messages in open source projects

Proceedings of the 2008 ACM symposium on Applied computing
Software process data quality and characteristics: a historical view on open and closed source projects

Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops
The Linux kernel as a case study in software evolution

Journal of Systems and Software
Automatic construction of an effective training set for prioritizing static analysis warnings

Proceedings of the IEEE/ACM international conference on Automated software engineering
The missing links: bugs and bug-fix commits

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Using hierarchal change mining to manage network security policy evolution

Hot-ICE'11 Proceedings of the 11th USENIX conference on Hot topics in management of internet, cloud, and enterprise networks and services
Dealing with noise in defect prediction

Proceedings of the 33rd International Conference on Software Engineering
ReLink: recovering links between bugs and changes

Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Free/Libre open-source software development: What we know and what we do not know

ACM Computing Surveys (CSUR)
Are Developers Fixing Their Own Bugs?: Tracing Bug-Fixing and Bug-Seeding Committers

International Journal of Open Source Software and Processes
A comparison of identity merge algorithms for software repositories

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

A recent editorial in Empirical Software Engineering suggested that open-source software projects offer a great deal of data that can be used for experimentation. These data not only include source code, but also artifacts such as defect reports and update logs. A common type of update log that experimenters may wish to investigate is the ChangeLog, which lists changes and the reasons for which they were made. ChangeLog files are created to support the development of software rather than for the needs of researchers, so questions need to be asked about the limitations of using them to support research. This paper presents evidence that the ChangeLog files provided at three open-source web sites were incomplete. We examined at least three ChangeLog files for each of three different open-source software products, namely, GNUJSP, GCC-g++, and Jikes. We developed a method for counting changes that ensures that, as far as possible, each individual ChangeLog entry is treated as a single change. For each ChangeLog file, we compared the actual changes in the source code to the entries in the ChangeLog file and discovered significant omissions. For example, using our change-counting method, only 35 of the 93 changes in version 1.11 of Jikes appear in the ChangeLog file—that is, over 62% of the changes were not recorded there. The percentage of omissions we found ranged from 3.7 to 78.6%. These are significant omissions that should be taken into account when using ChangeLog files for research. Before using ChangeLog files as a basis for research into the development and maintenance of open-source software, experimenters should carefully check for omissions and inaccuracies.