A first step towards algorithm plagiarism detection

Authors:
Fangfang Zhang;Yoon-Chan Jhi;Dinghao Wu;Peng Liu;Sencun Zhu
Affiliations:
Pennsylvania State University, USA;Samsung, South Korea;Pennsylvania State University, USA;Pennsylvania State University, USA;Pennsylvania State University, USA
Venue:
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Year:
2012

Citing 26
Cited 0

Software watermarking: models and dynamic embeddings

Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Using Slicing to Identify Duplication in Source Code

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Program slicing

ICSE '81 Proceedings of the 5th international conference on Software engineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An abstract interpretation-based framework for software watermarking

Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tamper-proofing software watermarks

ACSW Frontiers '04 Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32
Dynamic path-based software watermarking

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Algorithm Design

Algorithm Design
K-gram based software birthmarks

Proceedings of the 2005 ACM symposium on Applied computing
LOCO: an interactive code (De)obfuscation tool

Proceedings of the 2006 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
GPLAG: detection of software plagiarism by program dependence graph analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
QEMU, a fast and portable dynamic translator

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
A Software Birthmark Based on Dynamic Opcode n-gram

ICSC '07 Proceedings of the International Conference on Semantic Computing
The N-Version Approach to Fault-Tolerant Software

IEEE Transactions on Software Engineering
A dynamic birthmark for java

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
On the Limits of Information Flow Techniques for Malware Analysis and Containment

DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Detecting code clones in binary executables

Proceedings of the eighteenth international symposium on Software testing and analysis
Behavior based software theft detection

Proceedings of the 16th ACM conference on Computer and communications security
Detecting Software Theft via System Call Based Birthmarks

ACSAC '09 Proceedings of the 2009 Annual Computer Security Applications Conference
Value-based program characterization and its application to software plagiarism detection

Proceedings of the 33rd International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, we address the problem of algorithm plagiarism, which occurs when a plagiarist, violating intellectual property rights, steals others' algorithms and covertly implements them. In contrast to software plagiarism, which has been extensively studied, limited attention has been paid to algorithm plagiarism. In this paper, we propose two dynamic value-based approaches, namely N-version and annotation, for algorithm plagiarism detection. Our approaches are motivated by the observation that there exist some critical runtime values which are irreplaceable and uneliminatable for all implementations of the same algorithm. The N-version approach extracts such values by filtering out non-core values. The annotation approach leverages auxiliary information to flag important variables which contain core values. We also propose a value dependence graph based similarity metric in addition to the longest common subsequence based one, in order to address the potential value reordering attack. We have implemented a prototype and evaluated the proposed schemes on various algorithms. The results show that our approaches to algorithm plagiarism detection are practical, effective and resilient to many automatic obfuscation techniques.