Data clone detection and visualization in spreadsheets

Authors:
Felienne Hermans;Ben Sedee;Martin Pinzger;Arie van Deursen
Affiliations:
TU Delft, Netherlands;TU Delft, Netherlands;TU Delft, Netherlands;TU Delft, Netherlands
Venue:
Proceedings of the 2013 International Conference on Software Engineering
Year:
2013

Citing 33
Cited 3

Implications of data quality for spreadsheet analysis

ACM SIGMIS Database
Spreadsheets: a research agenda

ACM SIGPLAN Notices
Tracking structural evolution using origin analysis

Proceedings of the International Workshop on Principles of Software Evolution
Testing Homogeneous Spreadsheet Grids with the "What You See Is What You Test" Methodology

IEEE Transactions on Software Engineering
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Adding Apples and Oranges

PADL '02 Proceedings of the 4th International Symposium on Practical Aspects of Declarative Languages
Using Slicing to Identify Duplication in Source Code

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Detecting Merging and Splitting using Origin Analysis

WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Identifying redundancy in source code using fingerprints

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
Header and Unit Inference for Spreadsheets Through Spatial Analyses

VLHCC '04 Proceedings of the 2004 IEEE Symposium on Visual Languages - Human Centric Computing
Using a clone genealogy extractor for understanding and supporting evolution of code clones

MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms

WEUSE I Proceedings of the first workshop on End-user software engineering
Estimating the Numbers of End Users and End User Programmers

VLHCC '05 Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing
On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering
Inferring templates from spreadsheets

Proceedings of the 28th international conference on Software engineering
"Cloning considered harmful" considered harmful: patterns of cloning in software

Empirical Software Engineering
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Do code clones matter?

ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Discovery-based edit assistance for spreadsheets

VLHCC '09 Proceedings of the 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)
Near-miss function clones in open source software: an empirical study

Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Issues in clone classification for dataflow languages

Proceedings of the 4th International Workshop on Software Clones
Automatically extracting class diagrams from spreadsheets

ECOOP'10 Proceedings of the 24th European conference on Object-oriented programming
Tracking the evolution of code clones

SOFSEM'11 Proceedings of the 37th international conference on Current trends in theory and practice of computer science
Supporting professional spreadsheet users by generating leveled dataflow diagrams

Proceedings of the 33rd International Conference on Software Engineering
Detecting and visualizing inter-worksheet smells in spreadsheets

Proceedings of the 34th International Conference on Software Engineering
Models are code too: Near-miss clone detection for Simulink models

ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)
Detecting code smells in spreadsheet formulas

ICSM '12 Proceedings of the 2012 IEEE International Conference on Software Maintenance (ICSM)

Data debugging with continuous testing

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
BugMap: a topographic map of bugs

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Improving spreadsheet test practices

CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spreadsheets are widely used in industry: it is estimated that end-user programmers outnumber programmers by a factor 5. However, spreadsheets are error-prone, numerous companies have lost money because of spreadsheet errors. One of the causes for spreadsheet problems is the prevalence of copy-pasting. In this paper, we study this cloning in spreadsheets. Based on existing text-based clone detection algorithms, we have developed an algorithm to detect data clones in spreadsheets: formulas whose values are copied as plain text in a different location. To evaluate the usefulness of the proposed approach, we conducted two evaluations. A quantitative evaluation in which we analyzed the EUSES corpus and a qualitative evaluation consisting of two case studies. The results of the evaluation clearly indicate that 1) data clones are common, 2) data clones pose threats to spreadsheet quality and 3) our approach supports users in finding and resolving data clones.