Incremental updates of inverted lists for text document retrieval
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
Sim: a utility for detecting similarity in computer programs
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Software for detecting suspected plagiarism: comparing structure and attribute-counting systems
ACSE '96 Proceedings of the 1st Australasian conference on Computer science education
Cheating and plagiarism: perceptions and practices of first year IT students
Proceedings of the 7th annual conference on Innovation and technology in computer science education
Modern Information Retrieval
Information Retrieval
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
SIGCSE '81 Proceedings of the twelfth SIGCSE technical symposium on Computer science education
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Plagiarism detection across programming languages
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
Deducing similarities in Java sources from bytecodes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Fast plagiarism detection system
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Software development marketplaces: implications for plagiarism
ACE '07 Proceedings of the ninth Australasian conference on Computing education - Volume 66
Evolving similarity functions for code plagiarism detection
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Application of Information Retrieval Techniques for Source Code Authorship Attribution
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
A method for detecting the theft of Java programs through analysis of the control flow information
Information and Software Technology
Automatic detection of local reuse
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
Obfuscating plagiarism detection: vulnerabilities and solutions
Proceedings of the 12th International Conference on Computer Systems and Technologies
Plagiarism detection for Java: a tool comparison
Computer Science Education Research Conference
AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution
ACM Transactions on Computing Education (TOCE)
Fast plagiarism detection by sentence hashing
ICAISC'12 Proceedings of the 11th international conference on Artificial Intelligence and Soft Computing - Volume Part II
Robust plagiary detection using semantic compression augmented SHAPD
ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part I
DroidLegacy: Automated Familial Classification of Android Malware
Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop 2014
Hi-index | 0.00 |
Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment. Through detailed empirical evaluation on small and large collections of programs, we show that our approach is highly scalable while maintaining similar levels of effectiveness to that of the popular JPlag and MOSS systems. Copyright © 2006 John Wiley & Sons, Ltd.