Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Syntactic clustering of the Web
Selected papers from the sixth international conference on World Wide Web
ACM Transactions on Internet Technology (TOIT)
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Deducing similarities in Java sources from bytecodes
ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Building an XQuery interpreter in a compiler construction course
Proceedings of the 36th SIGCSE technical symposium on Computer science education
Privacy and Ownership Preserving of Outsourced Medical Data
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Hierarchical substring caching for efficient content distribution to low-bandwidth clients
WWW '05 Proceedings of the 14th international conference on World Wide Web
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
K-gram based software birthmarks
Proceedings of the 2005 ACM symposium on Applied computing
Comparison of texts streams in the presence of mild adversaries
ACSW Frontiers '05 Proceedings of the 2005 Australasian workshop on Grid computing and e-research - Volume 44
An anti-plagiarism editor for software development courses
ACE '05 Proceedings of the 7th Australasian conference on Computing education - Volume 42
Redundant documents and search effectiveness
Proceedings of the 14th ACM international conference on Information and knowledge management
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code
IEEE Transactions on Software Engineering
Managing déjà vu: Collection building for the identification of nonidentical duplicate documents
Journal of the American Society for Information Science and Technology - Research Articles
Desktop tools for offline plagiarism detection in computer programs
Informatics in education
GPLAG: detection of software plagiarism by program dependence graph analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Similarity and originality in code: plagiarism and normal variation in student assignments
ACE '06 Proceedings of the 8th Australasian Conference on Computing Education - Volume 52
Spatial Analysis of News Sources
IEEE Transactions on Visualization and Computer Graphics
Accurate discovery of co-derivative documents via duplicate text detection
Information Systems
Plagiarism detection using feature-based neural networks
Proceedings of the 38th SIGCSE technical symposium on Computer science education
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
Detecting near-duplicates for web crawling
Proceedings of the 16th international conference on World Wide Web
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
WormShield: Fast Worm Signature Generation with Distributed Fingerprint Aggregation
IEEE Transactions on Dependable and Secure Computing
Essential deduplication functions for transactional databases in law firms
Proceedings of the 11th international conference on Artificial intelligence and law
Multiple-signal duplicate detection for search evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 2007 ACM symposium on Document engineering
Detecting outsourced student programming assignments
Journal of Computing Sciences in Colleges
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Highly efficient techniques for network forensics
Proceedings of the 14th ACM conference on Computer and communications security
CASCON '07 Proceedings of the 2007 conference of the center for advanced studies on Collaborative research
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Ariadne: an eclipse-based system for tracking the originality of source code
IBM Systems Journal
Tracking Web spam with HTML style similarities
ACM Transactions on the Web (TWEB)
Computer-based plagiarism detection methods and tools: an overview
CompSysTech '07 Proceedings of the 2007 international conference on Computer systems and technologies
A preliminary study on various implementation approaches of domain-specific language
Information and Software Technology
Scalable detection of semantic clones
Proceedings of the 30th international conference on Software engineering
A tablet-based paper exam grading system
Proceedings of the 13th annual conference on Innovation and technology in computer science education
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach
International Journal of Business Intelligence and Data Mining
Statistical Debugging Using Latent Topic Models
ECML '07 Proceedings of the 18th European conference on Machine Learning
Iterative design and evaluation of an event architecture for pen-and-paper interfaces
Proceedings of the 21st annual ACM symposium on User interface software and technology
Note: Pattern matching with pair correlation distance
Theoretical Computer Science
Achieving both high precision and high recall in near-duplicate detection
Proceedings of the 17th ACM conference on Information and knowledge management
Winnowing-based text clustering
Proceedings of the 17th ACM conference on Information and knowledge management
Empirical evaluation of clone detection using syntax suffix trees
Empirical Software Engineering
Pattern Matching with Pair Correlation Distance
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Detecting Java Theft Based on Static API Trace Birthmark
IWSEC '08 Proceedings of the 3rd International Workshop on Security: Advances in Information and Computer Security
Towards a Normal Form for Mercury Programs
Logic-Based Program Synthesis and Transformation
Package upgrades in FOSS distributions: details and challenges
Proceedings of the 1st International Workshop on Hot Topics in Software Upgrades
An Algorithm for Sophisticated Code Matching in Logic Programs
ICLP '08 Proceedings of the 24th International Conference on Logic Programming
The toolbox for local and global plagiarism detection
Computers & Education
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
A static API birthmark for Windows binary executables
Journal of Systems and Software
Detecting the origin of text segments efficiently
Proceedings of the 18th international conference on World wide web
Efficient overlap and content reuse detection in blogs and online news articles
Proceedings of the 18th international conference on World wide web
Application of Information Retrieval Techniques for Source Code Authorship Attribution
DASFAA '09 Proceedings of the 14th International Conference on Database Systems for Advanced Applications
Evaluation of Text Clustering Algorithms with N-Gram-Based Document Fingerprints
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Plagiarism detection in game-playing software
Proceedings of the 4th International Conference on Foundations of Digital Games
Redundancy in network traffic: findings and implications
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Leveraging discarded samples for tighter estimation of multiple-set aggregates
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
A method for detecting the theft of Java programs through analysis of the control flow information
Information and Software Technology
A hardware platform for efficient worm outbreak detection
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Finding Similarities in Source Code Through Factorization
Electronic Notes in Theoretical Computer Science (ENTCS)
Compact full-text indexing of versioned document collections
Proceedings of the 18th ACM conference on Information and knowledge management
Behavior based software theft detection
Proceedings of the 16th ACM conference on Computer and communications security
Integrating conflicting data: the role of source dependence
Proceedings of the VLDB Endowment
Coordinated weighted sampling for estimating aggregates over multiple weight assignments
Proceedings of the VLDB Endowment
A word-frequency based method for detecting plagiarism in documents
IRI'09 Proceedings of the 10th IEEE international conference on Information Reuse & Integration
Part of Speech (POS) Tag Sets Reduction and Analysis Using Rough Set Techniques
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
New payload attribution methods for network forensic investigations
ACM Transactions on Information and System Security (TISSEC)
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
Content-dependent chunking for differential compression, the local maximum approach
Journal of Computer and System Sciences
Density analysis of winnowing on non-uniform distributions
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Recursive n-gram hashing is pairwise independent, at best
Computer Speech and Language
A static birthmark of binary executables based on API call structure
ASIAN'07 Proceedings of the 12th Asian computing science conference on Advances in computer science: computer and network security
Cooperative bug isolation: winning thesis of the 2005 ACM doctoral dissertation competition
Cooperative bug isolation: winning thesis of the 2005 ACM doctoral dissertation competition
Durable top-k search in document archives
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Towards a multi-scale approach for source code approximate match report
Proceedings of the 4th International Workshop on Software Clones
A behavioral analysis engine for network traffic
CCNC'10 Proceedings of the 7th IEEE conference on Consumer communications and networking conference
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
Efficient privacy-preserving similar document detection
The VLDB Journal — The International Journal on Very Large Data Bases
A GPU accelerated storage system
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Bimodal content defined chunking for backup streams
FAST'10 Proceedings of the 8th USENIX conference on File and storage technologies
Deprogramming large software systems
HotDep'08 Proceedings of the Fourth conference on Hot topics in system dependability
Scalable and systematic detection of buggy inconsistencies in source code
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Improved index compression techniques for versioned document collections
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Plagiarism detection across distant language pairs
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Code analyzer for an online course management system
Journal of Systems and Software
A study of the uniqueness of source code
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Finding inner copy communities using social network analysis
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part II
Automatic detection of local reuse
EC-TEL'10 Proceedings of the 5th European conference on Technology enhanced learning conference on Sustaining TEL: from innovation to learning and practice
A running time improvement for the two thresholds two divisors algorithm
Proceedings of the 48th Annual Southeast Regional Conference
Global detection of complex copying relationships between sources
Proceedings of the VLDB Endowment
SOLOMON: seeking the truth via copying detection
Proceedings of the VLDB Endowment
The study of plagiarism detection for object-oriented programming language
ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part III
Adapting moodle to better support CS education
Proceedings of the 2010 ITiCSE working group reports
Reuse in the wild: an empirical and ethnographic study of organizational content reuse
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
On sociomaterial imbrications: What plagiarism detection systems reveal and why it matters
Information and Organization
MeCC: memory comparison-based clone detector
Proceedings of the 33rd International Conference on Software Engineering
Value-based program characterization and its application to software plagiarism detection
Proceedings of the 33rd International Conference on Software Engineering
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Studying software evolution using artefacts' shared information content
Science of Computer Programming
CoMoTo: the collaboration modeling toolkit
Proceedings of the 16th annual joint conference on Innovation and technology in computer science education
Efficient similarity joins for near-duplicate detection
ACM Transactions on Database Systems (TODS)
N-gram based secure similar document detection
DBSec'11 Proceedings of the 25th annual IFIP WG 11.3 conference on Data and applications security and privacy
Who wrote this code? identifying the authors of program binaries
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Outlier-based approaches for intrinsic and external plagiarism detection
KES'11 Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems - Volume Part II
Plagiarism detection for Java: a tool comparison
Computer Science Education Research Conference
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Partial duplicate detection for large book collections
Proceedings of the 20th ACM international conference on Information and knowledge management
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
Mining relational structure from millions of books: position paper
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Detection of near-duplicate user generated contents: the SMS spam collection
Proceedings of the 3rd international workshop on Search and mining user-generated contents
Enhancing redundant network traffic elimination
Computer Networks: The International Journal of Computer and Telecommunications Networking
Detecting repackaged smartphone applications in third-party android marketplaces
Proceedings of the second ACM conference on Data and Application Security and Privacy
AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution
ACM Transactions on Computing Education (TOCE)
Searching semantically equivalent code fragments in logic programs
LOPSTR'04 Proceedings of the 14th international conference on Logic Based Program Synthesis and Transformation
Near-Duplicate mail detection based on URL information for spam filtering
ICOIN'06 Proceedings of the 2006 international conference on Information Networking: advances in Data Communications and Wireless Networks
Measuring similarity of large software systems based on source code correspondence
PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
Suppressing redundancy in wireless sensor network traffic
DCOSS'10 Proceedings of the 6th IEEE international conference on Distributed Computing in Sensor Systems
Word length n-grams for text re-use detection
CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
Fast plagiarism detection system
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Full-text search in email archives using social evaluation, attached and linked resources
Proceedings of the 21st international conference companion on World Wide Web
Journal of the American Society for Information Science and Technology
Instructor-centric source code plagiarism detection and plagiarism corpus
Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education
A first step towards algorithm plagiarism detection
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Primary data deduplication-large scale study and system design
USENIX ATC'12 Proceedings of the 2012 USENIX conference on Annual Technical Conference
A plagiarism detection system for arabic text-based documents
PAISI'12 Proceedings of the 2012 Pacific Asia conference on Intelligence and Security Informatics
Optimizing positional index structures for versioned document collections
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Experiments with filtered detection of similar academic papers
AIMSA'12 Proceedings of the 15th international conference on Artificial Intelligence: methodology, systems, and applications
Distributed application tamper detection via continuous software updates
Proceedings of the 28th Annual Computer Security Applications Conference
Expert Systems with Applications: An International Journal
RAMC: runtime abstract memory context based plagiarism detection in binary code
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Detecting source code similarity using code abstraction
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Software verification and graph similarity for automated evaluation of students' assignments
Information and Software Technology
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Measuring similarity of windows applications using static and dynamic birthmarks
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Juxtapp: a scalable system for detecting code reuse among android applications
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Bottom-k and priority sampling, set similarity and subset sums with minimal independence
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Crowd-scale interactive formal reasoning and analytics
Proceedings of the 26th annual ACM symposium on User interface software and technology
Viewing functions as token sequences to highlight similarities in source code
Science of Computer Programming
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Revolver: an automated approach to the detection of evasiveweb-based malware
SEC'13 Proceedings of the 22nd USENIX conference on Security
Research paper: Plagiarism Detection for Haskell with Holmes
Proceedings of the 3rd Computer Science Education Research Conference on Computer Science Education Research
Systematic audit of third-party android phones
Proceedings of the 4th ACM conference on Data and application security and privacy
Pattern mining of cloned codes in software systems
Information Sciences: an International Journal
Student perception and usage of an automated programming assessment tool
Computers in Human Behavior
CoBAn: A context based model for data leakage prevention
Information Sciences: an International Journal
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
IEEE/ACM Transactions on Networking (TON)
Hi-index | 0.00 |
Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents.We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any finger-printing technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.