Reverse Engineering and Design Recovery: A Taxonomy
IEEE Software
Substring Matching for Clone Detection and Change Tracking
ICSM '94 Proceedings of the International Conference on Software Maintenance
Experiences in program understanding
CASCON '92 Proceedings of the 1992 conference of the Centre for Advanced Studies on Collaborative research - Volume 1
Identifying redundancy in source code using fingerprints
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Using textual redundancy to study The Mintainability of source
Advances in software engineering
Using textual redundancy to understand change
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Using an integrated toolset for program understanding
CASCON '95 Proceedings of the 1995 conference of the Centre for Advanced Studies on Collaborative research
Navigating the textual redundancy web in legacy source
CASCON '96 Proceedings of the 1996 conference of the Centre for Advanced Studies on Collaborative research
Supporting the analysis of clones in software systems: Research Articles
Journal of Software Maintenance and Evolution: Research and Practice - IEEE International Conference on Software Maintenance (ICSM2005)
SoftGUESS: Visualization and Exploration of Code Clones in Context
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Towards a mutation-based automatic framework for evaluating code clone detection tools
Proceedings of the 2008 C3S2E conference
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Near-miss function clones in open source software: an empirical study
Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Hi-index | 0.00 |
As a result of maintenance activity legacy systems contain repeated text in the form of large and small blocks that appear in more or less the same form in several places. These repetitions define a structure that can contribute information about the development history of the source different from the documented version or the current directory structure.A strategy based on fingerprinting is used to obtain raw matches indicating where repetitions occur. The information inherent in these matches is then reorganized for easier processing, leading to a natural clustering of substrings. Suppression of detail is usually necessary to make further progress and can be done in several different ways.For example, matches of blocks of text identify associations within groups of files. In cases with complex clusters of files involving multiple overlapping subsets of files, Hasse diagrams can support visualization. Techniques useful for understanding such graphs can then be employed to provide significant insights into the structure of the redundancy and hence the source.The paper discusses this approach and shows results obtained from an example of reasonable size (40 Mbytes of source based on two releases of the GNU gcc compiler).