Using textual redundancy to understand change

  • Authors:
  • J. Howard Johnson

  • Affiliations:
  • Institute for Information Technology, National Research Council Canada, Montreal Road, Building M-50, Ottawa, Ontario K1A 0R6

  • Venue:
  • CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

As software systems evolve, their source, data, and documentation files change. Understanding the location and magnitude of this change can reveal information about the evolution process and the system itself.For processes that affect only small amounts of text, change can be identified by removing large blocks of identical text in common among snapshots of the system taken at different times. The results can be summarized to show where change has happened.A technique referred to as components of redundancy is introduced that allocates the amount of matching with nodes in the directory tree in a way that provides useful insight.Two case studies are presented that show different applications of this kind of change analysis: the evolution of source as a result of development and maintenance activities and the change caused by the installation of software on the system folder of a personal computer.These two examples show that this is a general purpose technology that addresses a set of problems in a number of unrelated domains. Other such applications involve the study of a complex build process, change in databases, or any malicious or unintentional modification to computer systems.