Software plagiarism detection: a graph-based approach

Authors:
Dong-Kyu Chae;Jiwoon Ha;Sang-Wook Kim;BooJoong Kang;Eul Gyu Im
Affiliations:
Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea;Hanyang University, Seoul, South Korea
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 11
Cited 0

Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Compiler Construction: Principles and Practice

Compiler Construction: Principles and Practice
An information-theoretic perspective of tf—idf measures

Information Processing and Management: an International Journal
MMSS: Multi-Modal Story-Oriented Video Summarization

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Detecting Java Theft Based on Static API Trace Birthmark

IWSEC '08 Proceedings of the 3rd International Workshop on Security: Advances in Information and Computer Security
A static API birthmark for Windows binary executables

Journal of Systems and Software
A method for detecting the theft of Java programs through analysis of the control flow information

Information and Software Technology
Yet another paper ranking algorithm advocating recent publications

Proceedings of the 19th international conference on World wide web
Constructing seminal paper genealogy

Proceedings of the 20th ACM international conference on Information and knowledge management
Software plagiarism detection via the static API call frequency birthmark

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As plagiarism of software increases rapidly, there are growing needs for software plagiarism detection systems. In this paper, we propose a software plagiarism detection system using an API-labeled control flow graph (A-CFG) that abstracts the functionalities of a program. The A-CFG can reflect both the sequence and the frequency of APIs, while previous work rarely considers both of them together. To perform a scalable comparison of a pair of A-CFGs, we use random walk with restart (RWR) that computes an importance score for each node in a graph. By the RWR, we can generate a single score vector for an A-CFG and can also compare A-CFGs by comparing their score vectors. Extensive evaluations on a set of Windows applications demonstrate the effectiveness and the scalability of our proposed system compared with existing methods.