GPLAG: detection of software plagiarism by program dependence graph analysis

Authors:
Chao Liu;Chen Chen;Jiawei Han;Philip S. Yu
Affiliations:
University of Illinois-UC, Urbana, IL;University of Illinois-UC, Urbana, IL;University of Illinois-UC, Urbana, IL;IBM T. J. Watson Research Center, Hawthorne, NY
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 13
Cited 37

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Algorithm for Subgraph Isomorphism

Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Linear time algorithm for isomorphism of planar graphs (Preliminary Report)

STOC '74 Proceedings of the sixth annual ACM symposium on Theory of computing
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Performance Evaluation of the VF Graph Matching Algorithm

ICIAP '99 Proceedings of the 10th International Conference on Image Analysis and Processing
Finding Interesting Associations without Support Pruning

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
CloseGraph: mining closed frequent graph patterns

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
DynaMine: finding common error patterns by mining software revision histories

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering

Finding what's not there: a new approach to revealing neglected conditions in software

Proceedings of the 2007 international symposium on Software testing and analysis
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
Visualizing program similarity in the Ac plagiarism detection system

AVI '08 Proceedings of the working conference on Advanced visual interfaces
Evolving similarity functions for code plagiarism detection

Proceedings of the 10th annual conference on Genetic and evolutionary computation
PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach

International Journal of Business Intelligence and Data Mining
Sourcerer: mining and searching internet-scale software repositories

Data Mining and Knowledge Discovery
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
DepRank: A Probabilistic Measure of Dependence via Heterogeneous Links

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Software reuse and plagiarism: a code of practice

ITiCSE '09 Proceedings of the 14th annual ACM SIGCSE conference on Innovation and technology in computer science education
Behavior based software theft detection

Proceedings of the 16th ACM conference on Computer and communications security
Malware detection based on dependency graph using hybrid genetic algorithm

Proceedings of the 12th annual conference on Genetic and evolutionary computation
The Austrian way of Wiki(pedia)!: development of a structured Wiki-based encyclopedia within a local Austrian context

Proceedings of the 6th International Symposium on Wikis and Open Collaboration
Graph homomorphism revisited for graph matching

Proceedings of the VLDB Endowment
MeCC: memory comparison-based clone detector

Proceedings of the 33rd International Conference on Software Engineering
Value-based program characterization and its application to software plagiarism detection

Proceedings of the 33rd International Conference on Software Engineering
Plagiarism detection for Java: a tool comparison

Computer Science Education Research Conference
Capturing topology in graph pattern matching

Proceedings of the VLDB Endowment
AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution

ACM Transactions on Computing Education (TOCE)
An empirical study on inconsistent changes to code clones at the release level

Science of Computer Programming
Distributed graph pattern matching

Proceedings of the 21st international conference on World Wide Web
Instructor-centric source code plagiarism detection and plagiarism corpus

Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education
A first step towards algorithm plagiarism detection

Proceedings of the 2012 International Symposium on Software Testing and Analysis
CBCD: cloned buggy code detector

Proceedings of the 34th International Conference on Software Engineering
Detecting similar software applications

Proceedings of the 34th International Conference on Software Engineering
Boreas: an accurate and scalable token-based approach to code clone detection

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Source code author identification with unsupervised feature learning

Pattern Recognition Letters
RAMC: runtime abstract memory context based plagiarism detection in binary code

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Detecting source code similarity using code abstraction

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Disguised malware script detection system using hybrid genetic algorithm

Proceedings of the 28th Annual ACM Symposium on Applied Computing
CodeBlast: a two-stage algorithm for improved program similarity matching in large software repositories

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Rendezvous: a search engine for binary code

Proceedings of the 10th Working Conference on Mining Software Repositories
Structural detection of android malware using embedded call graphs

Proceedings of the 2013 ACM workshop on Artificial intelligence and security
Detecting refactored clones

ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Simseer and bugwise: web services for binary-level software similarity and defect detection

AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
Strong simulation: Capturing topology in graph pattern matching

ACM Transactions on Database Systems (TODS)
Beyond plagiarism: An active learning method to analyze causes behind code-similarity

Computers & Education
Pattern mining of cloned codes in software systems

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Along with the blossom of open source projects comes the convenience for software plagiarism. A company, if less self-disciplined, may be tempted to plagiarize some open source projects for its own products. Although current plagiarism detection tools appear sufficient for academic use, they are nevertheless short for fighting against serious plagiarists. For example, disguises like statement reordering and code insertion can effectively confuse these tools. In this paper, we develop a new plagiarism detection tool, called GPLAG, which detects plagiarism by mining program dependence graphs (PDGs). A PDG is a graphic representation of the data and control dependencies within a procedure. Because PDGs are nearly invariant during plagiarism, GPLAG is more effective than state-of-the-art tools for plagiarism detection. In order to make GPLAG scalable to large programs, a statistical lossy filter is proposed to prune the plagiarism search space. Experiment study shows that GPLAG is both effective and efficient: It detects plagiarism that easily slips over existing tools, and it usually takes a few seconds to find (simulated) plagiarism in programs having thousands of lines of code.