Detecting plagiarism in student Pascal programs
The Computer Journal
Eraser: a dynamic data race detector for multithreaded programs
ACM Transactions on Computer Systems (TOCS)
Using meta-level compilation to check FLASH protocol code
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
An empirical study of operating systems errors
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
A system and language for building system-specific, static analyses
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Efficient and precise datarace detection for multithreaded object-oriented programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Tracking down software bugs using automatic anomaly detection
Proceedings of the 24th International Conference on Software Engineering
SPADE: An Efficient Algorithm for Mining Frequent Sequences
Machine Learning
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Substring Matching for Clone Detection and Change Tracking
ICSM '94 Proceedings of the International Conference on Software Maintenance
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics
ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Automatic verification of the SCI cache coherence protocol
CHARME '95 Proceedings of the IFIP WG 10.5 Advanced Research Working Conference on Correct Hardware Design and Verification Methods
Proceedings of the First JSSST International Symposium on Object Technologies for Advanced Software
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
A tool that detects plagiarism in Pascal programs
SIGCSE '81 Proceedings of the twelfth SIGCSE technical symposium on Computer science education
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
RacerX: effective, static detection of race conditions and deadlocks
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
CloseGraph: mining closed frequent graph patterns
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Identifying redundancy in source code using fingerprints
CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
CMC: a pragmatic approach to model checking real code
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
C-Miner: Mining Block Correlations in Storage Systems
FAST '04 Proceedings of the 3rd USENIX Conference on File and Storage Technologies
AccMon: Automatically Detecting Memory-Related Bugs via Program Counter-Based Invariants
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Using a clone genealogy extractor for understanding and supporting evolution of code clones
MSR '05 Proceedings of the 2005 international workshop on Mining software repositories
Advanced non-distributed operating systems course
ACM SIGCSE Bulletin
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code
IEEE Transactions on Software Engineering
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Have things changed now?: an empirical study of bug characteristics in modern open source software
Proceedings of the 1st workshop on Architectural and system support for improving software dependability
SmPL: A Domain-Specific Language for Specifying Collateral Evolutions in Linux Device Drivers
Electronic Notes in Theoretical Computer Science (ENTCS)
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
SoftGUESS: Visualization and Exploration of Code Clones in Context
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Static specification inference using predicate mining
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
I/O system performance debugging using model-driven anomaly characterization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Human-aware computer system design
HOTOS'05 Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10
Context-based detection of clone-related bugs
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange
Learning from mistakes: a comprehensive study on real world concurrency bug characteristics
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Documenting and automating collateral evolutions in linux device drivers
Proceedings of the 3rd ACM SIGOPS/EuroSys European Conference on Computer Systems 2008
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Frequent pattern mining for kernel trace data
Proceedings of the 2008 ACM symposium on Applied computing
Scalable detection of semantic clones
Proceedings of the 30th international conference on Software engineering
Towards easing the diagnosis of bugs in OS code
Proceedings of the 4th workshop on Programming languages and operating systems
Empirical evaluation of clone detection using syntax suffix trees
Empirical Software Engineering
Software Fault Localization Using N-gram Analysis
WASA '08 Proceedings of the Third International Conference on Wireless Algorithms, Systems, and Applications
Understanding customer problem troubleshooting from storage system logs
FAST '09 Proccedings of the 7th conference on File and storage technologies
Configuration-space performance anomaly depiction
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Listening to programmers Taxonomies and characteristics of comments in operating system code
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
COMPASS: A Community-driven Parallelization Advisor for Sequential Software
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Automatic mining of functionally equivalent code fragments via random testing
Proceedings of the eighteenth international symposium on Software testing and analysis
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Exploring the design space of proactive tool support for copy-and-paste programming
CASCON '09 Proceedings of the 2009 Conference of the Center for Advanced Studies on Collaborative Research
Towards understanding bugs in open source router software
ACM SIGCOMM Computer Communication Review
Decaf: moving device drivers to a modern language
USENIX'09 Proceedings of the 2009 conference on USENIX Annual technical conference
Scalable and systematic detection of buggy inconsistencies in source code
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Making the common case the only case with anticipatory memory allocation
FAST'11 Proceedings of the 9th USENIX conference on File and stroage technologies
An empirical study of long-lived code clones
FASE'11/ETAPS'11 Proceedings of the 14th international conference on Fundamental approaches to software engineering: part of the joint European conferences on theory and practice of software
Archiving the web using page changes patterns: a case study
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Making the common case the only case with anticipatory memory allocation
ACM Transactions on Storage (TOS)
Understanding modern device drivers
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Automated detection of refactorings in evolving components
ECOOP'06 Proceedings of the 20th European conference on Object-Oriented Programming
Harmfulness of code duplication: a structured review of the evidence
EASE'09 Proceedings of the 13th international conference on Evaluation and Assessment in Software Engineering
Understanding and detecting real-world performance bugs
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Empirical Software Engineering
Reducing test effort: A systematic mapping study on existing approaches
Information and Software Technology
Characterizing logging practices in open-source software
Proceedings of the 34th International Conference on Software Engineering
Active refinement of clone anomaly reports
Proceedings of the 34th International Conference on Software Engineering
Cloning in DSLs: experiments with OCL
SLE'11 Proceedings of the 4th international conference on Software Language Engineering
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
XIAO: tuning code clones at hands of engineers in practice
Proceedings of the 28th Annual Computer Security Applications Conference
LASE: locating and applying systematic edits by learning from examples
Proceedings of the 2013 International Conference on Software Engineering
Rendezvous: a search engine for binary code
Proceedings of the 10th Working Conference on Mining Software Repositories
Understanding the genetic makeup of Linux device drivers
Proceedings of the Seventh Workshop on Programming Languages and Operating Systems
A Study of Linux File System Evolution
ACM Transactions on Storage (TOS)
Fast mining Top-Rank-k frequent patterns by using Node-lists
Expert Systems with Applications: An International Journal
Pattern mining of cloned codes in software systems
Information Sciences: an International Journal
A study of Linux file system evolution
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Comparison and evaluation of source code mining tools and techniques: A qualitative approach
Intelligent Data Analysis
Hi-index | 0.00 |
Copy-pasted code is very common in large software because programmers prefer reusing code via copy-paste in order to reduce programming effort. Recent studies show that copy-paste is prone to introducing bugs and a significant portion of operating system bugs concentrate in copy-pasted code. Unfortunately, it is challenging to efficiently identify copy-pasted code in large software. Existing copy-paste detection tools are either not scalable to large software, or cannot handle small modifications in copy-pasted code. Furthermore, few tools are available to detect copy-paste related bugs. In this paper we propose a tool, CP-Miner, that uses data mining techniques to efficiently identify copy-pasted code in large software including operating systems, and detects copy-paste related bugs. Specifically, it takes less than 20 minutes for CP-Miner to identify 190,000 copy-pasted segments in Linux and 150,000 in FreeBSD. Moreover, CP-Miner has detected 28 copy-paste related bugs in the latest version of Linux and 23 in FreeBSD. In addition, we analyze some interesting characteristics of copy-paste in Linux and FreeBSD, including the distribution of copy-pasted code across different length, granularity, modules, degrees of modification, and various software versions.