Identifying syntactic differences between two programs
Software—Practice & Experience
Manufacturing cheap, resilient, and stealthy opaque constructs
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
A security architecture for survivability mechanisms
A security architecture for survivability mechanisms
Sandmark--A Tool for Software Protection Research
IEEE Security and Privacy
Static analysis of students' Java programs
ACE '04 Proceedings of the Sixth Australasian Conference on Computing Education - Volume 30
DMS®: Program Transformations for Practical Scalable Software Evolution
Proceedings of the 26th International Conference on Software Engineering
K-gram based software birthmarks
Proceedings of the 2005 ACM symposium on Applied computing
LOCO: an interactive code (De)obfuscation tool
Proceedings of the 2006 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
GPLAG: detection of software plagiarism by program dependence graph analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
A source code linearization technique for detecting plagiarized programs
Proceedings of the 12th annual SIGCSE conference on Innovation and technology in computer science education
Context-based detection of clone-related bugs
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
ATC'07 2007 USENIX Annual Technical Conference on Proceedings of the USENIX Annual Technical Conference
Scalable detection of semantic clones
Proceedings of the 30th international conference on Software engineering
On the Limits of Information Flow Techniques for Malware Analysis and Containment
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Behavior based software theft detection
Proceedings of the 16th ACM conference on Computer and communications security
A program plagiarism evaluation system
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part IV
A first step towards algorithm plagiarism detection
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Detecting encryption functions via process emulation and IL-based program analysis
ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Hi-index | 0.00 |
Identifying similar or identical code fragments becomes much more challenging in code theft cases where plagiarizers can use various automated code transformation techniques to hide stolen code from being detected. Previous works in this field are largely limited in that (1) most of them cannot handle advanced obfuscation techniques; (2) the methods based on source code analysis are less practical since the source code of suspicious programs is typically not available until strong evidences are collected; and (3) those depending on the features of specific operating systems or programming languages have limited applicability. Based on an observation that some critical runtime values are hard to be replaced or eliminated by semantics-preserving transformation techniques, we introduce a novel approach to dynamic characterization of executable programs. Leveraging such invariant values, our technique is resilient to various control and data obfuscation techniques. We show how the values can be extracted and refined to expose the critical values and how we can apply this runtime property to help solve problems in software plagiarism detection. We have implemented a prototype with a dynamic taint analyzer atop a generic processor emulator. Our experimental results show that the value-based method successfully discriminates 34 plagiarisms obfuscated by SandMark, plagiarisms heavily obfuscated by KlassMaster, programs obfuscated by Thicket, and executables obfuscated by Loco/Diablo.